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ABSTRACT 


Real-time interactive video applications, such as video teleconferencing, present 
difficult challenges to network designers due to strict quality of service constraints and 
the limitations of traditional video compression schemes. These limitations reveal 
themselves notably in two areas: poor error robustness and a lack of flexibility when 
dealing with multicast scenarios over heterogeneous networks. 

A more promising approach that improves error robustness while also offering a 
solution to the network heterogeneity problem is to employ a layered video codec. This 
thesis presents the implementation of a new layered video codec scheme. Block updating 
coupled with an aging algorithm is used in this scheme to select macroblocks for 
transmission. Block updating selects macroblocks that have changed due to scene 
motion, and the aging algorithm ensures that an entire frame is transmitted within a set 
time interval. Layering is accomplished through application of the fast Haar transform 
and/or the discrete cosine transform. Layer assignments are made by grouping bands of 
coefficients with similar variances. Quantization and encoding for motion video employs 
both an industry standard and uniform quantization with a custom variable length coding 
table. For static slides, uniform quantization and a second custom variable length coding 
table are employed. Rate control is accomplished via the reduction of a four-dimensional 
operational distortion surface to a one-dimensional optimal curve implemented as a 


simple table lookup of quantizers. 
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I INTRODUCTION 


A. BACKGROUND 


Video teleconferencing (VTC) is expected to contribute significantly to the US 
Navy’s Information Technology for the 21% Century (IT-21) initiative. IT-21 seeks to 
change the paradigm to warfighting from a platform-centric approach to a network- 
centric approach, where information superiority is leveraged with smart weapons in order 
to achieve the desired result more effectively. The goal of IT-21 is to link all US Forces 
together in a network that enables transmission of voice, video, and data from individual 
workstations seamlessly to both local and remote users [1][2]. Employing VTC over a 
tactical network at the battle group level caters to several useful applications, such as 


collaborative planning, distance learning, remote maintenance and telemedicine. 


1. Multimedia Communication and Tactical Video Teleconference 
(VTC) 

In general, multimedia communications are either unicast or multicast. Unicast 
represents peer-to-peer communications while multicast represents m to n 
communications where m ranges from 1 to n. Unicast includes any client-server 
applications, such as video on demand (VOD) or IP telephony. Multicast examples 
include distance learning and remote conferencing. The tactical VTC scenario 
considered here is inherently a multicast application running over a heterogeneous, 
wireless network. Here, “heterogeneous” implies that a connection traverses a series of 
links (each imposing potentially different bandwidths), and the recipient workstations 
differ in capability and capacity; “wireless” implies an internetwork of wireline local area 
networks (LANs) connected by at least one wireless channel. As such, each transmitter 
transmits to multiple receivers in the multicast group. The multicast group consists of 
some combination of active participants that are allowed to transmit and passive 


participants that only receive. For example, the problem of transmitting multicast, 








multimedia traffic over ATM networks was examined in [3]. This situation is illustrated 
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in Figure I-1. 
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Figure I-1: Simple VTC Multicast with Two Active and Two Passive Nodes. 


While the network described here affords large bandwidth within each shipboard 
LAN, VTC realizes its true value only if implemented within the entire battlegroup. It is 
because the wireless channel constrains both the bit rate and the bit error rate (each of 
which, in turn, impacts the robustness of the VTC application) that the tactical VTC. 
network presents challenges not encountered with wireline networks and provides a 


hostile environment for traditional video codecs. 
2. Video Compression and Robustness 


Since video and audio, to a lessor degree, are bandwidth intensive, the signals 
must undergo video coding prior to transmission, trading a reduction in bandwidth for a 
reduction in quality. Because the human visual system (HVS) places greater relative 
importance on lower frequencies than higher frequencies, typical motion video is 
perceived as primarily lowpass [4]. Therefore, lossless two-dimensional (2-D) transform 
methods are used to create a frequency domain representation of an image. The HVS 
perceptive properties can then be exploited by quantizing the resulting coefficients to 


varying degrees of precision with higher precision allotted to the lower frequency 











coefficients. Quantization reduces the dynamic range of the coefficients and loses 


information, but it enables fewer bits to represent the coefficients. Usually, many of the 
higher frequency coefficients are zeroed out via this quantizing process; runs of zeroes 
ae created. Because zeros need not be explicitly represented, run-length encoding (RLE) 
is utilized to generate a more compact representation of the quantized coefficients, which 
is subsequently replaced by a more efficient lossless variable length code (VLC). These 
techniques are collectively called spatial compression. Spatial compression is the basis ° 
of image compression standards like that of the Joint Photographic Experts Group 
(JPEG). 

A video codec may simply treat each frame of a video as a separate still image 
and subject it to spatial compression independently from the other frames. This approach 
is known as intraframe coding. An example is Motion-JPEG (M-JPEG). Intraframe 
coding has the advantage of superior error resilience since decode errors are confined 
strictly to the current frame. Its disadvantage is that compression gain with acceptable 
image quality is limited to approximately 0.5 bits per pixel (bpp) [4]. 

For a given quality, higher compression gains are realized if the video codec 
exploits the high degree of correlation that video frames tend to exhibit on a frame-to- 
frame basis. This is called interframe coding, and it eliminates redundancy by coding 
only the differences in successive frames. The compression gains achieved by interframe 
coding vary in relation to the degree and type of motion that occurred between successive 
frames. Nearly static-content frames exhibit high correlation and result in high 
compression gain while highly dynamic-content frames have little correlation and result 
in less compression gain. If two successive frames have no correlation, perhaps due to a 
scene change, interframe coding performs no better than intraframe coding and typically 
performs worse due to the overhead required to track motion. The disadvantage to 
interframe coding results from the dependency between frames at the decoder. If errors 
occur in the current frame, the errors tend to propagate among frames as well as spatially 


within frames [5][6]. Consequently, video codecs like that of the Moving Pictures 








Experts Group (MPEG) incorporate both types of compression techniques in an attempt 


to increase compression gain and bound error propagation. 
3. Traditional Video Codecs 


Low bit rate video coding standards such as H.261 and H.263 perform best in 
homo geneous, unicast environments. The video server negotiates a desired Quality of 
Service (QoS) consistent with the desired video quality and available bandwidth prior to 
delivery. Since network conditions are rarely static, the received video quality typically 
changes due to dropped or incomplete frames caused by losses within the network. With 
the implementation of a closed loop control scheme via feedback reports from the 
recipient, the server can react to the changing network conditions and adjust the 
quantization, frame size, or frame rate in order to vary the bit rate. 

However, when traditional video codecs are applied in a wireless, multicast, 
heterogeneous environment, shortcomings are revealed. First, the traditional scheme 
relies on guaranteed bandwidth for delivery, trading bandwidth for quality. Selecting an 
appropriate quality (and therefore required bandwidth) poses a dilemma in a 
heterogeneous network. Since each user is reached bya different path on the network, 
each experiences different levels of congestion. Even with feedback, the controllable 
application is faced with a quandary in determining how to make adjustments. Sending 
high quality, high bandwidth video supports some users but leaves low bandwidth 
recipients with degraded video due to high packet loss. If the lowest common 
denominator is supported instead, all recipients are forced to view lower quality video, 
and the high bandwidth links are underutilized. Clearly, meeting the varied expectations 
with a single video Stream is impractical and transmitting multiple video streams with 
gradations in quality demands a much greater bandwidth expense. 

Second, the poor error robustness demonstrated by traditional low bit rate video 
coders is especially troublesome since retransmission is not practical in a real-time 
application. Finally, feedback itself is undesirable in a low bit rate network. Feedback 
employed with the goal of mitigating congestion actually consumes available bandwidth, 


causes an additional load on constrained nodes, and increases congestion further. 











4, Receiver-Based Layered Multicast (RLM) 


A promising approach that offers a solution to the network hielerosencity problem 
while offering some improvement in error robustness is the receiver-based layered 
multicast (RLM) scheme proposed by McCanne et al. [7]. RLM employs a layered video 
codec that transmits video in scalable layers that progressively refine quality. An 
independently decodable base layer is generated that guarantees a minimum acceptable 
quality. Separate enhancement layers increase quality in a hierarchical manner. With 
RLM, each recipient can decode just the base layer for low, but acceptable, quality or add 
one or more enhancement layers to improve quality as bandwidth and hardware permit. 


This idea is illustrated in Figure I-2. 
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Figure I-2: Overview of Layered Coder/Decoder. 


RLM represents a starting point for designing an integrated approach to 
improving robustness. The layered structure slightly reduces the effects of congestion 
because a particular node only needs to carry subscribed layers. Unsubcribed layers can 
be dropped. Figure I-3 illustrates this concept. Furthermore, earlier work by Rhee and 
Gibson indicates that layered video exhibits improved resilience to bit errors introduced. 
during transmission because spreading bit errors across multiple layers has less negative 


impact on the reconstructed video [8]. 
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_ Figure I-3: Adapting to Network Heterogeneity Using RLM. 





B. THESIS OBJECTIVE 


The main objective of this thesis is to implement in Matlab’ the new layered video 
coding scheme for a tactical VTC proposed by Parker [9] [10] for use in multicast, 
hetergeneous, wireless, asynchronous transfer mode (ATM) networks. The tactical VTC 
session is assumed to consist of low-motion video, such as a “talking head” in the style of 
a typical newscast and static displays in the style of an overhead presentation. No 
assumption is made about the relative proportions of the two types of content; the session 
could be all low-motion video, a sequence of static displays, or any combination of the 
two types in any order. 

The complete specifications of the coder are detailed in Table I-1 [9]. For the 
implementation in this thesis, a color depth of 8-bit grayscale is evaluated, but the 
technique can be expanded to include 4:2:0 sub-sampled 24-bit color. Audio 
compression to 8 kbps or less may be accomplished via code excited linear prediction 
(CELP) speech coding, but it is not addressed further in this work. Error robustness is 
provided via a block updating scheme that limits the impact of decoder errors. Layering 
is accomplished by judiciously grouping the frequency domain content obtained from the 
fast Haar transform (FHT) and/or the discrete cosine transform (DCT); the exact method 


of frequency decomposition depends on the video content. Handling both motion video 


' Matlab is a registered trademark of The MathWorks, Inc. 











and static slides with a single coder requires significant flexibility and compromise since 


the frequency characteristics of each are different. Therefore, the coder is optimized to . 
handle each type of content separately — separate layering and separate, custom VLC _ 
tables. Finally, the bit rate control issue is examined, and an approach that reduces an n- 


dimensional rate control problem to a simple table lookup is implemented. 








“VTC Stream =: 2 Parameter Value 
Video , Bandwidth 64-96 kbps 
Resolution 176x144 (QCIF) 
Frame Rate : 10 fps | 
| Color Depth 8-bit gray/4:2:0 24-bit color 
Audio _ Bandwidth | <8kbps _ 


Table I-1: Tactical VTC Multimedia Requirements. 








C. THESIS ORGANIZATION 


Chapter II considers techniques for coding video. A general discussion of video 
stream structural hierarchy, transform coding methods, quantization techniques, entropy 
encoding, frame coding, and quality measurement is presented. Chapter III presents the 
specific techniques utilized in coding both motion video and static slides. Chapter IV 
presents results. Conclusions and recommendations for future study are given in Chapter 
V. Appendix A and Appendix B provide the custom VLC tables used with motion video 
and static slides, respectively. Appendix C is a Matlab code library of the layered video 


coder implementation. 
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Il © LAYERED VIDEO CODER DESIGN CONSIDERATIONS 


The basic components of a video coder are shown in Figure [I-1. This chapter 
begins with an explanation of video stream hierarchy and then discusses each of the 
component parts displayed in Figure [I-1. Motion compensation is present only if the 
coder attempts to exploit frame-to-frame correlation. Otherwise each frame is processed 
independently. The chapter concludes by addressing a method for quantifying the 


amount of distortion introduced by the processing. 
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Figure II-1: Basic Components of a Video Coder. 
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A. VIDEO STREAM STRUCTURAL HIERARCHY 


A video stream is organized into a hierarchy of logical components. Although the 
specific organizational scheme varies with the particular coder under consideration, some | 
of the most common components are the following. 

A frame or picture is the basic display unit. It isa sampled version of the original 
scene taken at a particular instant in time. Each frame is composed of a rectangular array 
of pixels or points within the frame. Each pixel contains a data structure indicating its 
luminosity or color. The dimensions of the array define the picture resolution, given as 
columns x rows, such as the ITU-T defined QCIF resolution of 176x144. Pixels within a 
frame are organized, in order of increasing size, into blocks, macroblocks, and groups of 
macroblocks (GOBs). A block is an 8x8 array of pixels; a macroblock is a 16X16 array 


of pixels or, equivalently, four blocks. A frame may be considered as rows of 














macroblocks. One or more contiguous rows of macroblocks are termed a GOB. This 


hierarchy is illustrated in Figure I-2. 


ad Group of Pictures a 








Picture 1 Picture m 


Figure II-2: Organizational Hierarchy for Compressed Video. 


B. TRANSFORM CODING 


Although compression through direct scalar quantization of pixel values is 
possible, it is inefficient. An alternative approach is to use transform coding. Because © 
contiguous pixels within a frame tend to be highly correlated, the application of a suitable 
linear transform’ to decorrelate the pixels yields two primary advantages. The first is a 
property termed “energy packing” efficiency. The second is that the resulting 


coefficients are more conducive to perceptual-based quantization schemes. 


* The transform should be lossless and invertable. Lossless means that no information is lost through 
application of the transform. Invertable means that the original information is recoverable. 











A signal is decorrelated if the application of a transform causes the signal’s 


autocorrelation matrix to become diagonal; that is, it uncorrelates the resulting 
coefficients. The optimal transform tightly packs all the energy (information) into the 
minimum number of coefficients possible, resulting in the highest energy packing 7 
efficiency. Arranging these NV coefficients in decreasing order of magnitude and retaining 
only the first k coefficients gives the least distortion as measured by mean squared error 
(MSE) compared to any other set of k coefficients. Similarly, a given level of 
quantization of the decorrelated coefficients results in the least distortion of the original 
data (1 1]. Because certain transform coefficients may hold greater perceptual relevance 
by the HVS, this dependency can be exploited by utilizing a frequency-based transform 
and then quantizing— a lossy process — with a step size proportional to the perceptual 


importance of each coefficient. — 
1. Discrete Cosine Transform (DCT) 


Theoretically, the discrete-time Karhunen-Loeve transform (KLT) provides the 
greatest energy packing efficiency [11]. However, two liabilities preclude the use of the 
KLT in video compression. First, the KLT is extremely computationally intensive — 
requiring order N’ operations. Second, the KLT is signal dependent — requiring a 
separate eigenvector calculation for each transformed data block. Instead, transforms that 
approach the energy packing efficiency of the KLT and possess more efficient algorithms 
are utilized in video coders. | 

A widely used transform for image processing 1s the two-dimensional (2-D) 

_ discrete cosine transform (DCT). The 2-D DCT provides energy packing performance 
very close to that of the KLT and can be implemented with fast algorithms that reduce the 
computational effort to the order of Nlog,N [4]. A frame is transformed by partitioning 
the frame into NXN regions of pixels and applying the 2-D DCT to each individual 
region. N can be any integer provided that integer multiples of N equals the overall 
dimensions of the frame. Because the correlation among contiguous pixels tends to 
decrease with increasing size of the NXN region which, in turn, decreases the resulting 


compression gain, the typical NXN size used with the 2-D DCT is 8x8 (a block). 
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Denoting the original block as f({i,j) and the coefficient block by F(u,v), the 2-D DCT is 
given by [4] 


a2 SN (2i +1)au ae 
F(u,v) = 7 OCODD, fa joo C+ Dae — ee cos — ), (1-1) 
where 
| i wish 
C(x) = 42’ 7 2) 
| 1, otherwise. 
The inverse DCT is given by 
N-! N-1 ° 
fa= a3 C(u)C(v) coy as hog Oe le (u,v). (11-3) 
NaS 2N 2N, 


The result is a block of 64 coefficients having a spatial frequency interpretation as 
shown in Figure II-3. The F(0,0) coefficient represents the DC contribution and the 
remaining 63 coefficients represent the AC contribution. The different elements of an 
image map into the frequency domain as shown in Figure II-4 [4]. The application of the 
2-D DCT to an 8x8 block, where there is typically little variation from pixel to pixel, 
leads to a predominance of lowpass content in the frequency domain. Given this 
condition, the magnitudes of the AC coefficients are largest in the region around the DC 


coefficient and diminish with increasing spatial frequency. 


12 





Increasing horizontal frequency 


AC 


Coefficients 


Increasing vertical frequency 





Figure II-3: Frequency Interpretation of DCT Coefficients. 
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Figure II-4: Structural Decomposition of Image Elements [4]. 


2. Discrete Wavelet Transform (DWT) 


Wavelet coding is another technique for compressing still images and has been 
shown to offer slightly better image quality than DCT-based schemes for similar levels of 
compression at the cost of greater computational complexity [12]. Applying a wavelet 
transform to an image transforms that image into a “multifrequency component 
representation,” where each component has its own frequency characteristics and spatial 


features.” A discrete wavelet transform (DWT) filters and decimates an image into 


3 The wavelet transform, like the 2-D DCT, is lossless and invertable. 
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separate subbands of coefficients containing a mixture of the high frequency and low 
frequency details. Image decomposition is accomplished via two analysis filters. The 
first extracts the low frequency content, the signal average; the second extracts the high 
frequency content, the signal details. | 

The simplest example of a DWT is the fast Haar transform (FHT) [13]. It is 
described by its signal average equation, 


a,(n) = = (% (2n) + x, (2n +1))  (iI-4) 
and its signal detail equation, | 
d,(n) = —(%9(2n)- x9(2n+D), (-5) 


where Xo is the original data vector, and vectors a, and d; are the first order analysis 
(decomposition) vectors. If xo contains 2” elements, applying Equations [l-4 and IL.5 
generates analysis vectors of length ) ia 3 

Referring to Figure II-5, the LL subband is calculated by applying the average 
equation to the columns and rows of the image. The LL subband retains the lowpass 
information from the original image and presents a coarse representation of the original 
image. Because typical images have lowpass characteristics, most of the energy from the 
image is represented in this subband. The HL subband is calculated by applying the 
average equation to the columns of the image and the detail equation to the rows. The 
HL subband retains the vertical edge details from the original image. Typically, less 
energy from the original image is represented in the HL subband compared to that of the 
LL subband. The LH and HH subbands are found analogously with LH retaining 
horizontal edge details and HH retaining the diagonal edge details. The HH subband 
_ typically has the lowest energy content. In fact, in some applications the HH subband is 
discarded in its entirety [14]. | 








LLLL | LLLH 


Figure Alo: Octave-Based myavelet Decomposition. 





In order to differentiate the frequency content of an image further, a second DWT 
can be applied. Still referring to Figure II-5, the effect of a second order octave-based 
decomposition obtained by applying the same analysis filters to the LL subband is 
illustrated. This increase in the number of subbands allows tailoring additional stages in 
the coder to emphasize the perceptually. important details over less perceptual 


background noise as will be discussed in greater detail later. 
3. Comparison of Transforms 


Since the 2-D DCT is applied to pixel blocks within a frame that are subsequently 
quantized, it has a tendency to create “blocking” artifacts that disturb the continuity of the 
reconstructed frame. The same effect also leads to the presence of “ringing” artifacts 
around sharp edges. In contrast, the wavelet transforms typically are applied to the entire 
image and separate the entire image into re gions of high and low frequency content. 
These regions may be quantized and coded independently and result in more efficient bit 
allocation. This produces a more visually pleasing, smoother reconstructed image 
compared to that obtained via a 2-D DCT ata comparable peak signal-to-noise ratio 
(pSNR). In general, at comparable pSNRs, wavelet transform coders offer compression 
gains superior to that of DCT-based coders [12]. 

However, several drawbacks have limited the utility of wavelet-based video 


compression. Wavelets achieve quality superior to DCT-methods by processing the 











entire image or frame, but traditional video coders exploit temporal correlation at a 
subframe (usually macroblock) level. Although the error block could be transformed via 
a DWT, no significant advantage has been determined over the DCT, and the 
computational effort is greater [12]. However, the frequency decomposition offered by 


DWTs provides a powerful basis for layered video coding. 
C. QUANTIZATION TECHNIQUES 


After the transform, the coefficients are quantized. Quantization is a lossy 
process used to reduce precision and zero out some coefficient values. The benefit is that 
each coefficient can then be represented with fewer bits. However, the information loss 
is not recoverable, and the loss is manifested as distortion within the reconstructed image. 
This distortion is termed quantization noise. 

| The typical quantization scheme, uniform quantization, involves dividing each 
coefficient F,,, by the quantizer step size Q,, and rounding the result to the nearest integer 


to produce a quantized coefficient Fg, as follows [15]: 





Fy = nearest ean Fi } (Ii-6) 
| uv 
The values used in image reconstruction are then Fu, multiplied by the step size. 
However, as Equation I-6 implies, the step value may be adjusted for each particular 
coefficient with Q,, now representing elements of a quantizer matrix. Choosing the 
appropriate step size involves a trade-off between acceptable error and desired 
compression. Employing a small step size yields low quantization noise but little 
compression. The opposite is the case for a large step size. Although uniform 
quantization is often used, the choice is suboptimal since the individual coefficients are 
not distributed uniformly [16]. However, two broad approaches are employed to refine 
the selection of step size: HVS weighting and bit allocation strategies. HVS weighting | 
heuristically refines the step size based on perceptual relevance while bit allocation | 


Strategies attempt to spread errors optimally across all coefficients. 
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1. Human Visual System (HVS) Weighting 


Because the HVS places greater relative importance on lower frequencies than 
higher frequencies, step sizes based on HVS modeling are selected such that lower 
frequencies coefficients are quantized more finely while higher frequency coefficients are 
quantized more coarsely [4]. The HVS is also more acute to luminous intensity than 
chromatic intensity. Therefore, different quantizer matrices are developed for each. An 
example of a luminance quantization matrix widely used in JPEG compression is given in 
Figure II-6 [4]. The dimensions match the 8x8 block size used with the 2-D DCT. In 
application of a quantizer matrix, the entire matrix can be scaled’by a multiplicative 
constant in order to scale quantization noise and compression while maintaining the 


relative importance among the coefficients. 


16 11 10 16 24 40 51 61 
12 12 14 19 26 58 60 55 
144 13 16 24 40 57 69 56 
14 17 22 29 51 87 80 62 
18 22 37 56 68 109 103 77 
24 35 55 64 81 104 113 92 
49 64 78 87 103 121 120 101 
72 92 95 98 112 100 103 99 


Figure II-6: HVS-Based Luminance Quantization Matrix [4]. 


2. Bit Allocation 


Using a bit allocation approach, the value of the step size is chosen to minimize 
distortion within a bit budget. It is a classical resource allocation problem, where the 
fundamental trade-off in quantization is between rate (number of bits) and distortion 
(approximation error) formalized as rate-distortion theory. Several sophisticated 
algorithms utilize Lagrange methods applied to arbitrary rate-distortion curves. Of these, 


a popular approach is varying the step size in proportion to the variance of the coefficient | 
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[17]. However, bit allocation schemes, regardless of the methodology, do not account for 


the sensitivity to different spatial frequency characteristics of human visual perception. 
D. ENTROPY ENCODING 


Further compression can be realized through a lossless process called entropy 
encoding that removes redundant information. The simplest style of entropy encoding is 
run-length encoding (RLE). With RLE, a data set is parsed to locate sequences of 
repeated values. Any such sequence is replaced by a codeword consisting of a delimiter, 
the value, and the number of times the value is repeated. The longer the sequence of the 
repeated value, the greater the available compression. Following quantization, many 
elements of a coefficient block typically have a value of zero. Often many high 
frequency rows and columns are completely composed of zeros. Accordingly, it is 
advantageous to scan the coefficient block in a manner as to produce the longest runs of 
zeros. In JPEG compression, scanning the quantized coefficient block as a vector in 
zigzag fashion starting with the DC coefficient down to the Fx coefficient has been 
shown to increase the run-length of zeros [4]. Since the repeated value here is known (0), 
an adaptation of the basic codeword scheme cited above consists of the run-length of 
zeros followed by the magnitude of the next non-zero value. If there is no remaining 
non-zero value in the block, a special end-of-block (EOB) codeword is used instead. 
| After RLE, the quantized coefficient block is represented by a set of codewords, 
where each codeword represents a symbol drawn from a larger source alphabet. Variable- 
length coding (VLC) minimizes the average codeword length by assigning shorter 
codewords to the most probable symbols and longer codewords to the least likely 
occurring symbols while maintaining each uniquely decipherable (UD). Huffman coding 
is the most widely used entropy-encoding algorithm and is guaranteed to produce a 
minimum average length, UD code [15]. The Huffman algorithm uses each symbol’s 
probability of occurrence and builds a prefix code using an optimum binary-branching 
tree. Since both the coder and decoder need to use the same coder and generating a 


Huffman table is computationally expensive, standard tables are normally pre-defined 
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using data drawn from test images. Optimal compression is no longer guaranteed, but 


encoding and decoding are faster, and the need to transmit the VLC table is avoided. 


E. FRAME CODING 


As discussed previously, intraframe coding compresses each frame of a video as a 
separate, still image. Its advantage is superior error resilience; its disadvantage is limited 
compression gain. Interframe coding exploits frame-to-frame correlation by coding only 
the difference in successive frames, thereby allowing the potential for higher compression 
gain than that achieved by intraframe coding alone. The disadvantage to interframe 
coding is in robustness since decode errors can propagate between frames and spatially 
within a frame depending on the algorithm. 

Because of the compression required for the target VTC scenario, strict intraframe 
coding is not an option. Therefore, a robust algorithm for interframe coding is needed. 
Several source-coding techniques used to exploit temporal redundancy, such as motion 
compensation, differential pulse code modulation (DPCM) and block updating, are 
discussed in [4]. For a robust application in a real-time, heterogeneous, multicast 
environment, block updating was judged the most promising and is discussed further. 

Interframe coding via block* updating is a variation on intraframe coding. With 
block updating, each block in the current frame is compared to the corresponding block 
in the previous frame, and a distance metric is calculated. If the metric is above a 
threshold value, that block is intracoded and transmitted. Otherwise, no further 
consideration is given to that block; it is skipped. Thus, block updating conserves 
bandwidth by coding and transmitting only those blocks that have changed perceptually 
since the previous frame [18]. In low-motion video, such as the “talking head” scenario, 
motion is confined to a relatively small region within a frame and the background 


remains static. A significant bandwidth savings is possible. 


4 Tn this context, “block” implies an NXN pixel region, not necessarily 8x8. 
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In order for block updating to be effective, a suitable distance metric must be 
defined. Common distance metrics employed are MSE, sum of absolute differences . 
(SAD), and absolute sum of differences (ASD) [4] [14]. 

Utilizing block updating alone for a tactical VTC, where end-users may join a 
| session that is already in progress, has a liability. End-users joining late will never 
receive a block that does not exceed the selection threshold, thus leaving them with a 
partially reconstructed video. In order to mitigate this affect, block updating can be 
combined with an aging algorithm that periodically forces block updates. Such an 


algorithm guarantees a full scene reception within some set interval [14]. 
_F. MEASURING QUALITY 


Given that video coders trade compression gain for image quality, quantifying the 
level of distortion introduced is useful in evaluating different coding schemes. A useful 
measure of image distortion D is the MSE between the original (x ) and reconstructed 
(x) images [4]: | 

D=L YY (en —SanP 7) 
MN Seer | 


Using distortion D, the signal-to-noise ratio (SNR) is determined as 
SNR = 10log,, Dp’ (Ii-8) 


where o” is the variance of the original image. The most widely published measure of 
image quality is the peak signal-to-noise ratio given by 
kK? 

PSNR = 10log,, a, (1-9) 
where K is the maximum peak-to-peak value in the image, 255 for the typical 8-bit — 
image. For example, a typical peak SNR for a typical JPEG encoded grayscale image is 
28 dB at 0.5 bpp. 

Using MSE as a measure of image quality does have drawbacks. MSE does not 


directly define perceptual quality since all errors are given equal weight. Two 
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compression techniques yielding the same MSE for an image may deliver slight 


differences in perceptual quality. 
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lil. LAYERED VIDEO CODECS 


A. LAYERED CODING SCHEMES 


Parker [9] discusses several previous approaches to layered video coding. They 
can be classified into three fundamental categories: component-based layering, spatial- 
based layering, and frequency-based layering. 

Component-based layered coders transmit a base layer and usually a single 
enhancement layer. A traditional DCT-based coder (yielding some set quality) supplies 
the base layer. The enhancement layer improves this quality by either providing 
augmenting information to the base layer or correcting distortion present within the base 
layer. Two such schemes are proposed by Rhee and Gibson [8] [20]. Implicit in the 
component-based approach is that the enhancement layer may duplicate information 
already present in the base layer. | 

Spatial-based layered coders partition a frame into hierarchical areas of interest 
and encode each area separately. For example, the “talking head” frame shown in Figure 
Ti-1 may be partitioned into two areas of interest with the speaker being of primary 
interest and the remainder of the frame being of secondary interest. With spatial-based — 
layering, relatively more bandwidth would be allocated to the region encompassing the 
speaker with the balance of available bandwidth being apportioned to the rest of the 
frame. Bahl and Hsu [21] have proposed a coder that incorporates content-sensitive 
spatial decomposition and multiresolution coding. The difficulty with implementing a 
spatial-based coder is creating the hierarchical areas of interest dynamically while | 


minimizing the overhead required to identify their shifts relative to the layer assignments. 


ZS 

















Figure III-1; Sample “Talking Head” Video Frame. 


Frequency-based layered coders decompose a frame into subbands and then 
arrange the subbands into individual layers. The frequency decomposition may be 
applied to the entire frame or individual macroblocks within the frame. Each layer may 
contain one or more subbands. The base layer contains at least the low frequency 
subband but may also contain higher frequency subbands in order to improve base layer 
quality. Figure Ill-2 illustrates this concept using the DWT to decompose a frame, where 
the LL subband alone constitutes the base layer, the LH and HL subbands are combined 
into a first enhancement layer, and the HH subband constitutes the second enhancement 


layer. 
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Figure IfI-2: Basic Layered Video Coder Using the DWT. 


The principle advantages to frequency-based layering are its extendibility and 
flexibility. Further decomposing the frame through application of the DWT to each 
subband depicted in Figure III-2 creates a total of 16 subbands. Repeated application of 





the DWT to these subbands creates even more subbands. With a larger number of 


subbands, there is increased flexibility in the manner in which they can be assigned to 











layers. This, in turn, facilitates exploiting each subband’s perceptual importance. 


Additionally, a greater number of subbands permits a greater possible number of layers. 
Three coders that utilize the frequency-based approach are discussed in [14], [22], and 
[23]. | | | 


B. A LOW-COMPLEXITY, ADAPTIVE CODER FOR TACTICAL VTC 


As cited above, several diverse approaches to designing layered coders have been 
proposed; each emphasizes different network architectures or applications to varying 
degrees. However, there is neither a consensus in identifying a preferred, structured 
approach nor a consensus in quantifying those parameters that make a layered coder 
effective. This section presents the development and implementation of a new layered 
coder that is specifically tailored for the tactical VTC scenario where limited transmission 
bandwidth is available, and the ropusiness of transmission is at a premium. | 

Both the characteristics of the tactical VTC application and the desire to quantify 
effectiveness guide the strategy for developing and implementing the new layered coder. 
Specifically, the application yields five requirements. The coder must 1) provide a video 
stream characterized by the bandwidth, resolution, frame rate, and color depth detailed in 
Table I-1, 2) optimize compression adaptively for both low motion video and static 
slides, 3) provide a low complexity architecture to minimize coding delays and power 
requirements, 4) provide error resilient decoding at high packet loss rates, and 5) 
constrain bit rate to a predetermined average. 

The guiding factors stemming from the desire to quantify effectiveness are 
twofold. First, the coder must provide a base layer with acceptable quality and two (or 
more) enhancement layers that progressively improve perceptual quality. Second, the 
coder must minimize the bitstream overhead required to accommodate the layering 
structure. A functional diagram of the implemented coder is shown in Figure II-3. Each 


component is addressed in follow on sections. 
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Figure II-3: Functional Block Diagram of Proposed Layer Coder. 


1. Temporal Compression via Block Updating 


Temporal compression is accomplished via block updating applied at the 
macroblock level. Only those macroblocks exhibiting sufficient change on a frame-to- 
_ frame basis are encoded. Although block updating has been shown to yield inferior 
compression performance relative to motion prediction algorithms, block updating was 
chosen because the performance differential is small when low activity video is 
considered [24], and greater robustness is realized because temporal error propagation is 
greatly limited and spatial error propagation is eliminated. Block updating also negates 
the need for the locally decoded reference frame that is common to motion prediction 
schemes. This greatly simplifies the coder by eliminating the overhead required by the 
performance of an inverse quantization/transform at the coder. Block selection, as 
considered here, is solely with regard to motion video sequences. Since Static slide 
sequences exhibit little or no motion, block selection via motion detection is of limited 
utility there. Indeed, transmissions during static sequences result from considerations 
presented in the next section, where macroblock aging is discussed. 


Sufficient change between corresponding, frame-successive macroblocks is 





determined by calculating a distance metric between them and comparing the result to a 


threshold. The distance metric utilized is the non-normalized. ASD given by: 














ASD = 


M N | . 
YY Cnn -*t,] ? (Ii-1) 


m=] n=! 





where x, represents the pixel value within the current block while x* , represents the 


pixel value in the reference block. The primary reasons for choosing ASD over other 
metrics are twofold. First, the ASD is computationally efficient compared to MSE and 
SAD. The ASD employs only additions and subtractions and a single absolute value 
operation. MSE requires expensive multiplication operations — making it ill suited to 
real-time applications. SAD requires the same number of additions and subtractions as 
ASD, but it requires MxN — 1 more absolute value operations. Second, since the ASD 
takes a single absolute value of a sum, it acts as an accumulator and provides a lowpass 
filtering effect that removes noise in pixel intensities introduced through video capture. 
This smoothing prevents the threshold from being exceeded spuriously in otherwise static 
regions of the frame sequence whereas the nonlinear operations performed on a per-pixel 
basis in MSE and SAD tend to accumulate this noise energy and yield spurious block 
selections. Thus, the ASD metric allows bandwidth to be more effectively devoted to 
regions of interest [14]. | 

Two independent elements affect video quality and, therefore, the required bit 
rate: adequate motion detection (to prevent “jerky” reconstructed video) and control of 
distortion introduced through quantization. The goal in motion detection is to select the 
maximum block selection threshold that adequately captures motion. In the video 
sequences examined, a threshold of 160 (for ASD) proved adequate for detecting 
perceptual motion and resulted in an average of 24.8 macroblocks selected per frame. 

Further considering the issue of computational expense, the ASD metric is applied 
to individual 8x8 blocks within the macroblock; the first block to exceed the threshold 
triggers macroblock selection and ends the search. This avoids the expense of examining 
the remaining blocks of the macroblock. Additionally, since the HVS acuity is more 
sensitive to changes in luminous than chromatic intensity [4], distance calculations are 
confined to the luminous component of each pixel even if chromatic information were _ 


available. 











Since motion in VTC scenes tends to be confined to discrete objects within the 
scene (as opposed to scene motion caused by a camera pan), search efficiency is slightly 
affected by the order in which the individual blocks are examined. The approach that 
proved more efficient in the test video sequences considered here is to maximize the 
distance between the first two blocks examined. As shown in Figure III-4, two search 
patterns were compared: a clockwise search starting from the upper left block and an X- 
pattern search that examines the upper left block followed by the lower right. Given that 
a macroblock was selected for transmission, the X-pattern resulted in a 2.5% decrease in 


the average number of blocks examined per frame. 





Figure III-4 Block Search Order; a.) Clockwise and b.) X-pattern. 


Still greater search efficiency is realized by using the X-pattern search and 
varying the starting block of each macroblock to match the anticipated motion at that 
point in the frame. Because motion in VTC sequences is fairly confined, macroblocks 
tend to be selected in the same manner frame-to-frame. For example, a speaker shifts left 
or right and/or slightly up or down. Therefore, search speed is increased by having the 
search algorithm remember the identity of the specific block, termed the “anchor”, which 
caused a particular macroblock to be selected in the previous frame. For the subsequent 
distance metric calculations, search begins at the anchor. If the anchor again causes 
selection or if the macroblock is not selected, the anchor identity is unchanged. If 
another block causes selection, the anchor identity is updated. Using this search scheme 
produced an additional 20% reduction in the number of blocks examined for selected 


macroblocks. A more complex approach not examined here is to remember the two 


blocks that most frequently cause selection and tailor the search accordingly. 














2. Aging Algorithm 


Utilizing this block updating scheme alone for macroblock refreshment exposes 
some undesirable performance characteristics. One such consequence is termed 
hysteresis. Consider an arbitrary macroblock whose content is changing temporally due 
to the movement of an item within its spatial bounds. The item travels from its initial 
position along some trajectory to its final position. At some point in the trajectory, let the 
change in the macroblock’s content be sufficient to exceed the motion-detection 
threshold, so the macroblock is selected for update. If the final position of the item has 
not yet been reached at the time of macroblock selection, the item continues movement to 
its destiny. Once the final position is reached, hysteresis occurs if the distance between 
this final position and the item’s position at the last macroblock update is insufficient to ° 
force another macroblock selection; that is, the distance is less than the threshold. In this 
case, the displayed macroblock at the receiver is left with a persistent error. As another 
illustration of hysteresis, consider a frame sequence depicting only slight motion 
contained within an arbitrary macroblock’s spatial position. If the change between 
successive macroblocks as calculated by the distance metric is below the threshold value, 
the macroblock is not selected for encoding. If several frames in the video sequence 
continue to depict similar slight motion below the threshold, the displayed macroblock at 
the receiver eventually shows a persistent error. 

Another problem is in the duration the error artifacts (introduced by the channel 
_ through dropped or corrupted packets) are maintained in a receiver’s reconstructed video. 
Error artifacts in the dynamic portion of a scene tend to last only a single frame because 
block updates occur frequently. However, any such error in a relatively static region 
tends to persist longer due to the much lower frequency of block updating there. 

A final problem occurs when new participants are allowed to join a VIC already 
in progress (dynamic multicast). Since only those macroblocks depicting motion above 
the perceptual threshold are transmitted, new participants receive only a portion of the 


current scene. Although the portion received is the most dynamic region of frame (e.g. 
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the speaker), a speaker completely disassociated from the background yields an awkward 
reconstruction. | | 

Complementing the block updating scheme with an aging algorithm that 
intermittently forces macroblock updates alleviates these problems. The general 
principle is that the coder monitors the number of frames since each macroblock was last . 
encoded — its “age.” If a macroblock’s age exceeds a predetermined interval, that 
macroblock is flagged for encoding and transmission. Thus, the aging algorithm 
guarantees a maximum period between macroblock updates and bounds both the duration 
of hysteresis errors and the persistence of visual artifacts caused by the channel. Such 
bounding also ensures that new participants to a dynamic multicast session can construct 
an entire frame in a timely fashion. 

Obviously, forcing selection of macroblocks for transmission that would not have 
been selected otherwise increases the bandwidth requirement, but this impact can be 
mitigated by the manner in which the aging algorithm is implemented. Important | 
considerations are the following. Simply increasing the interval for macroblock selection 
by aging decreases the required bandwidth but increases the persistence of decode errors 
at receivers and prolongs the time required for new participants to receive a full frame. — | 
Merely forcing a macroblock update after n frames have passed without selection leads to 
an undesirable correlation in updates following scene changes. Although motion within a 
scene tends to disperse updates to some extent, a sufficiently static background region 
would still lead to correlation of a significant fraction of the block updates. The worst 
case is realized by a scene change where the new scene is entirely static, such as an 
overhead slide. In this case, the bit rate would spike every n frames. In order to avoid 
undesirable spikes in bit rate, it is advantageous to spread the number of macroblocks 
selected by aging evenly over time. This, in turn, requires a scheme that ages each block 
independently. | | 

The aging algorithm developed for the coder implemented here tracks the age of 
each macroblock indirectly. Instead of counting the frames since a given macroblock 


was last updated, the coder maintains an update vector identifying the number of frames 
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remaining until each macroblock must be updated. Each value, m, in the update vector is 
chosen from a discrete uniform distribution in the range [1,n] and is interpreted as an 
update scheduled m frames in the future. Using a uniform distribution to schedule 
updates smoothes macroblock selection over n frames and decorrelates the selections due 
to aging. Choosing the aging intervals randomly also prevents events, such as scene 
changes, from correlating updates and causin g periodic spikes in bit rate. The specific 
value chosen for n controls the tradeoff between the additional bandwidth required and 


coder responsiveness. For a given value of n, the average number of macroblocks 


selected through aging per frame, Mure , is 
2M 


w+) = 


Mare = 


where M is the total number of macroblocks in a frame. 

The final block selection algorithm is performed by the function, m_blk_id_xr.m. 
m_bik_id_xrm is provided in Appendix C, and it incorporates motion detection and aging 
as follows. As each frame is captured, the update vector entry corresponding to each 
macroblock is decremented by one. As each macroblock is processed for selection in a 
given frame, the coder examines the macroblock’s entry in the update vector. If its 
corresponding entry has reached zero, the macroblock is selected for transmission. 
Otherwise, the distance metric is applied to determine if the macroblock should be 
selected due to motion. If either criterion 1s met, a new random entry is generated for that 
update vector position. The order of these two events is important. Since the distance 
metric need not be calculated if the macroblock is selected by aging, there is net decrease 
in the number of calculations required to select a macroblock for transmission. 

For the assumed VTC video format of 176x144 QCIF, M is 99. At the assumed 
frame rate of 10 fps, setting n to 20 guarantees all macroblocks are encoded within two 
seconds and yields 9.43 as the average number of macroblocks selected through aging per 
frame. However, the true bandwidth impact is less than this value since some of the 
macroblocks selected via aging would have been selected by motion. For the test motion . 


video sequences considered here, the average number of macroblocks selected per frame 
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increased to 31.4 (from 24.8 without aging). The payoff for this modest increase in 
bandwidth is that all the problems encountered by using the distance metric alone for 
macroblock selection are bounded to this duration. The duration of two seconds 
represents a compromise judged acceptable between the additional bandwidth required 


and the desired coder responsiveness. 
3. Layering Strategy 


Macroblocks selected for transmission are decomposed in frequency using a 
DWT. Performing the selection process prior to the transform reduces computational 
cost because the transform is only applied to those macroblocks requiring transmission. 
The DWT was chosen since frequency decomposition, as discussed earlier, offers the 
most flexibility in the subsequent populating of layers. However, the task of determining 
a consistent, extendable scheme for determining an appropriate number of layers and the 
manner in which the frequency content within each macroblock was to be apportioned 
across those layers remains. | | 

Parker [9] proposes a set of heuristic guidelines to determine the appropriate layer 
assignments and bit allocation. First, as layers are to be hierarchical in importance, layer 
assignments should map frequency content to that hierarchy in a manner consistent with 
perceptual importance. Second, the base layer must provide acceptable quality, and the 
addition of each enhancement layer must provide a perceptual improvement in quality. 
Since the broad goal in image or video coding is to remove information that is not 
perceptually relevant, transmitting a layer that provides no perceptual improvement in 
quality wastes bandwidth and is, therefore, undesirable. Third, the number of bits 
assigned to each layer should be substantive so that dropping a layer potentially decreases 
congestion in the network. Finally, the bandwidth consumed by the image data should 
also be sufficient to avoid an excessive relative consumption by network control symbols 
and overhead. This is especially critical for low bit rate video. 

The DWT chosen was the fast Haar transform (FHT). The FHT has several ! 
desirable properties with regard to minimizing coder complexity. First, the FHT, as a 


real transform, avoids the necessity for complex arithmetic and simplifies storage. 
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Second, the FHT is not computationally demanding, requiring only addition, substraction, 


and left- and right-shifts [13]. Finally, unlike more sophisticated wavelet transforms, the 
FHT does not require extending or padding the data set. However, the simplicity of the 
FHT may lead to blocking artifacts at hi gh compression levels since the average and 
detail calculations are confined only to contiguous pixels. 

Since the defining equations of the FHT and their manner of application to affect 
a first order frequency decomposition were discussed previously in Section II.B.2, only a 
summary of the first order FHT decomposition is detailed in Table II-1. The operation 
is accomplished via the function fht.m as provided in Appendix C. Again, higher order 
frequency decompositions may be accomplished by recursively applying the FHT to each 


of the subbands created by the next lower order decomposition. 
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Table III-1: Significance and Determination of DWT Subbands. 
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Although a greater number of layers would offer more flexibility in managing 
quality and congestion, the coder restricts the number of layers to three. The decision to 
limit the number of layers to three was driven primarily by the constraint of a target bit 

Tate in range of 64-96 kbps. At these bit rates and with each layer consuming an equal 
amount of bandwidth overhead, three layers appeared to be the limit in terms of 
producing enhancement layers that provided a perceptual improvement in quality while 
maintaining a base layer that rendered an acceptable quality. 

_ Establishing a suitable layered structure for motion video sequences can be posed 
as the following conditional, two-part problem. Given that n layers are desired, 
determine the degree to which a selected macroblock is decomposed in frequency and the 
manner in which the resulting subbands are assigned into layers. Parker [9] proposes 


using a variant of a split-and-merge algorithm, which was originally proposed by Diab, et 
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al. [25] to identify regions of equivalent activity in the spatial domain, and applying it at 
the macroblock level in the frequency domain to identify region of similar energy and 
perceptual content. The particulars of the implementation follow, but the essence of the _ 
approach is that a selected macroblock is uniformly decomposed in frequency via the 
FHT, subbands of approximately equal variance are merged, and the resulting regions are 
apportioned into individual layers. These steps were performed on representative video 
sequences, and the ensuing layering structure is implemented within the coder. 

Utilizing representative video sequences, each selected macroblock is uniformly 
decomposed in frequency by recursively applying the FHT until the desired number of 
subbands is obtained. The first order analysis creates four 8x8 subbands; the second 
order analysis produces sixteen 4x4 subbands. The third order analysis results in sixty- 
four 2x2 subbands. Extending the recursion to its limit yields 128 subbands consisting of 
a single point. In practical terms, a second order analysis as shown in Figure II-5 proved 


sufficient for three layers. 


16x16 


Macroblock 





Figure [II-5: Second Order FHT Decomposition of a Macroblock. 


Next, the variance of the coefficient set composing each subband was determined 
across all frames of video. These variances were then used as a subband merging metric. 
Two benefits are afforded by employing subband variance as the merging metric. First, 
with motion video, the variance of coefficient sets appear to possess an inverse 
relationship to spatial frequency and, by extension, to perceptual importance. That is, the 
more perceptually relevant subbands at lower frequencies exhibit higher variances. 


Consequently, differences in variance provide a convenient mechanism for assigning 
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subbands to a layered hierarchy. Second, grouping subbands with similar variances 
simplifies coder architecture since each group can employ a common quantizer. Several 
quantization algorithms use variance as an indication of the dynamic range exhibited by 
coefficients and allocate bits by varying quantizer step size in inverse proportion to 
variance. The particulars of the quantization scheme employed are discussed in more 
detail later. 

The subband variances calculated from a first order analysis of the test video — 
sequences are shown in Figure III-6. Those from a second order analysis are shown in 
Figure II-7. As Figure IIl-6 illustrates, subband variance provides a good indication of 
the energy concentration within each subband. For instance, since motion video is 
characteristically lowpass, the variance is largest in the LL subband and smallest in the 
HH subband. By extension, subband variance therefore serves as a relative indicator of 
the perceptual importance among subbands — an observation that suggests subband 
variance should dictate layer assignments. As Figure III-7 illustrates, a second order 
analysis further separates the frequency content of the subband to which it is applied. 
The LL subband decomposes into the LLLL, LLLH, LLHL, and LLHH subbands 
containing the lowpass, horizontal, vertical, and diagonal edge details, respectively, 
which were previously lumped into the LL subband alone. Performing the second order 
analysis of the LH (horizontal detail) subband apportions energy in a manner Syaiietical 
to the original first order analysis; that is, as the LH subband is found in the northeast 
quadrant of Figure IIJ-6, the energy distribution attained by the second order analysis is 
‘concentrated in the northeast and northwest sub-quadrants with a slightly greater energy 
in the northeast sub-quadrant as shown in Figure IJ-7. Similar observations can be made 
for the HL and HH subbands. These characteristics further leverage the argument for 


using subband variances to make layer assignments in a hierarchical manner. 
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Figure II-6: Subband Variances after a First Order Analysis (Motion Video). 
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Figure II-7: Subband Variances after a Second Order Analysis (Motion Video). 





After variance data had been gathered for each subband at the desired order of — 
analysis, the next step was to merge adjacent subbands exhibiting similar variances into 
an entity termed a “partition.” The criterion outlined by [9] is to merge adjacent 


subbands k, and k, when 
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where N,j is the total number of subbands created by successive application of the FHT, 
and 6 min and 6 max are the minimum and mecca variances, respectively, found among 
all the subbands. Applying the merge algorithm to the subbands listed in Figure IIT-7 
results in the partitions shown in Figure III-8. Assuming that subbands are statistically 
independent, the variance of each partition P; is now simply the sum of the variances for . 


the subbands k; comprising that partition: 


o, =) o;- (III-5) 
k; €P,, 
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Figure [I-8: Partitions Resulting from Merge Algorithm (Motion Video). 


Next, these partitions are assigned to layers L; until the requisite number of layers 


were created using the set of heuristic rules proposed in [9]. . 





Rule 1: No layer may have a greater variance than a lower layer. That is, given N 
layers, 


2 2 2 
O07 >O;, >: >}, . (IH-6) 


— Rule 2: Layers must be populated in order of increasing frequency. A layer may 


not contain a partition of lower frequency content than any layer below it. 


- Rule 3: Partitions that meet the criterion given by Equation III-3 are assigned to 


the same layer even if the partitions are non-contiguous. 
Rule 4: Partitions are applied to layers in a symmetric fashion. 
Rule 5: If more than two subbands comprising a coarser subband remain as 


partitions after merging and applying the above rules, all of the partitions 


comprising the coarser subband are merged together into one partition. 














Rule 6: If one or more partitions is moved between layers, as required to achieve 
a more balanced distribution of bit rates or quality, move the partition(s) with the 
lowest variance if promoting to a higher layer and the partition(s) with highest 


variance if demoting to a lower layer. 


Application of these rules to the partitions shown in Figure III-8 culminated in the 
final layering scheme for motion video sequences shown in Figure III-9. The base layer 
is essentially a lowpass-filtered version of the original macroblock, and the two 
enhancement layers progressively added higher frequency details. Additionally, since the 
LL subband retains many of the perceptual properties of the original macroblock, it was 
further transformed using the 2-D DCT via the function, dct_of_fht.m. This additional 
transform allowed the LL subband to be processed using JPEG-based quantization and 
encoding techniques to maximize retention of the most perceptually relevant information. 

Summarizing the process as implemented in the coder, each macroblock selected 
for transmission is decomposed in frequency usin g the FHT. The LL subband is further 
transformed via the 2-D DCT as previously discussed and assigned to layer I. The HH 
subband is assigned to layer I in its entirety. The HL and LH subbands are decomposed 
with a second application of the FHT. The resulting subbands are then partitioned and 
assigned to layers II and HI as appropriate. 
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Figure [11-9 Final Layering Scheme for Motion Video Sequences. 


The situation with static slide sequences consisting of text and line drawings is 





much different. Sequences such as these demonstrate a greater dependency on their 














higher frequency components for perceptual recognition. A hierarchical layering scheme 
based on the lowpass characteristics of motion video yields blurred, indistinct 
reconstructions until the higher frequency components are included as well. Such a 
structure, of course, is contrary to the principle of layered video transmission. Therefore, 
a separate layering scheme is needed if the video stream is to include both types of 
sequences. | 

The foregoing remarks not withstanding, pursuing the split-and-merge algorithm 
still provides valuable insight. The variances of the subbands produced by a first order 
FHT analysis of the text and line drawing static sequences are shown in Figure I-10; 
those of a second order FHT analysis are shown in Figure IIJ-11. Compared to Figure 
II-6 and Figure JII-7 — the analogous results for motion video sequences — it is clear that 
energy is much more evenly distributed among subbands of static slides. In practical 
terms, this implies a much more complex relationship between variance and perceptual 


importance. 
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Figure I1I-10: Subband Variances after a First Order Analysis (Static Slides). 
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Figure III-11: Subband Variances after a Second Order Analysis (Static Slides). 
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Applying the split-and-merge as before results in the partitions identified in 
Figure III-12. Using the rules of layer assignment as before, a reasonable layering 
scheme might assign partitions P;, P2, and P, to layer J. However, this implementation 
led to a poor reconstruction. Even adding P3, Ps, and P¢ so that the vast majority of the 


energy was included in the reconstructed image failed to provide acceptable quality. 








Clearly contrary to the motion video case, variance alone is a poor guide to determining 


perceptual relevance. 
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Figure ITI-12: Partitions Resulting from Merge Algorithm (Static Slides). 


















Producing an “effective” base layer required allocating a portion of the frequency 
content from each of the 8x8 subbands to it. Therefore, while maintaining symmetry in 
assignment, those 4x4 subbands that exhibited comparatively larger variances among the 
four subbands derived from the same parent subband were allocated to layer I. The 
remaining subbands were divided between the remainin g layers in order of increasing 


frequency. The final layering scheme for static slide sequences is shown in Figure I-13. 
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Figure I-13: Final Layering Scheme for Static Slide Sequences. 


Although the final layering scheme does not stem directly from the partitions in 
Figure IJ-12, continued analysis on them is insightful nonetheless. Merging partitions 
with similar variance reduces the partitions to those shown in Figure III-14. The 


difference in variances of partitions P; and P2 is only slightly too large to permit their 
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merging by the criterion of Equation II-3. The difference is small enough, however, to 
justify quantizing both bands with the same step size. The simplicity gained in 
quantizing both bands together balances any potentially suboptimal bit allocation. 


Therefore, the final partitions for quantization purposes are as given in Figure I-15. 





Figure ITI-15: Partitions for Quantizing Purposes (Static Slides). 


4. Quantization and Entropy Encoding 


After the transform stage, the coefficients are quantized and subjected to entropy 
encoding. For motion video sequences, the process is depicted in Figure IJ-16. The base . 
layer (LL subband) is quantized using the luminance matrix (Figure [I-6) via the function, 
quantizer_ll.m, run-length encoded using a zigzag scan via the function zz_b.m, and 


converted to a VLC using the luminance VLC table suggested in [4]. The conversion to a 
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VLC is accomplished stepwise via three functions: make_it_compact.m, parse_Huff.m, 
and get_bits_Huff.m. This approach leverages the LL subband’s retention of the lowpass 
characteristics of the original macroblock. The implementation is as discussed previously 
in Sections II.C and ILD with the scaling factor of the quantization matrix designated as 
qi’. The remaining subbands are each quantized uniformly using a common step size for 
all its coefficients. Using variance as an indication of the dynamic range of the 
coefficients within a given subband and comparing to Figure HI-9, it is reasonable to 
quantize all the subbands composing each enhancement layer with the same step size. 


Therefore, the quantizer step size for layer I is set to q2; that of layer II is set to q3. 


se jpeg [SEE JPEG 
a VLC Layer I 
HLLL, HLHL, RLE 
LHLL, LHLH Layer II 
HLLH, HLHH, q3 — 
LHHL, LHHH, HH 7 | Layer Ii 





Figure IIJ-16: Quantization and Coding for Motion Video Macroblocks. 


Unlike the JPEG based coding of the LL subband, zigzag scanning of the 
quantized FHT coefficients provided no apparent coding gain. Instead, trials indicated a 
simpler horizontal raster scan was adequate for all subbands except HL. The HL showed 
a slight preference for a vertical raster scan. This seems rational given the frequency 
detail represented in this band. The scan orders are summarized in Table IHI-2 where the 
scan order applies to the indicated subband as well as its child subbands. (The LL entry 
pertains only to the coding of the static content macroblocks as discussed later, but it is 


included here for completeness.) Horizontal raster scans are performed by the function, | 


> As executed in the code, the scaling factor q, is a parameter in quantizer_ILm. A value of 16 for q; means | 
no scaling of the values in Figure II-6. A value smaller than 16 results in finer quantization and less 
quantization noise. q; must be positive. 
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raster.m; vertical raster scans are performed by the function, vertical.m. Upon 


completion of the run-length encoding accomplished by the functions 
make_it_compact.m and parse_3D.m, each coefficient is represented by a set of 
codewords. Each codeword is then mapped to an entry in a custom VLC table via the © 
function get_bits2.m. The VLC table is provided in Appendix A. Its structure mirrors 
the three-dimensional (3-D) event structure employed by the H.263 coding standard [26], 


and the methodology of its creation is presented in the next section. 





LL Horizontal 


LH Horizontal 
HL Vertical 
HH _ Horizontal 





Table [lI-2: Scan Order for Run-Length Encoding Quantized Coefficients. 


The quantization and encoding process for selected macroblocks from static 
sequences is shown in Figure I-17. This process differs from the motion video process 
in three ways. JPEG-based quantization is not used; all sixteen subbands are supplied to 
one of three independent uniform quantizers with fixed step sizes of qi, q2, and q3. 
Contrary to the motion macroblock scheme, the step sizes are not associated with layers, 
but with the partitions depicted in Figure I-15. Finally, the VLC table that is provided 
in Appendix B and customized for static slides is used. The same three functions cited 
above accomplish these steps with the appropriate VLC table being employed via the 
argument kind in the function get_bits2.m signaling the type of frame under 


consideration — motion video or static slide. 
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Figure III-17: Quantization and Coding for Static Macroblocks. 


Neither Figure II-16 nor Figure III-17 indicates the presence of the control signal 
from the Control Unit shown in Figure III-3. The control signal allows manipulation of 


qi, q2, and q3 as required by a bit rate control scheme. Rate control is covered later. 
5. Generating Customized VLC Tables 


The VLC coding scien mirrors the 3-D event structure employed by the H.263 
standard. Each non-zero coefficient is replaced by an equivalent event described by three 
RLE parameters [26]:{LAST, RUN, LEVEL}, where LAST indicates whether there are 
any more non-zero coefficients in the current subband, RUN indicates the number of 
successive zeros that precede the non-zero coefficient, and LEVEL represents the non- 
zero magnitude of the quantized coefficient. Each event maps to a VLC codeword to — 
which a sign bit is appended to represent the sign of the coefficient. 

Using various combinations of q, q2, and q;, a series of representative motion 
video test sequences was processed. The processing followed the implementation as 
discussed thus far (except for the last step of mapping each codeword obtained form RLE 
to a VLC table.) The relative frequency of occurrence of each RLE codeword was then 
used to create a Huffman VLC utilizing an optimal binary-branching tree. Since every 
possible {LAST, RUN, LEVEL} event is not guaranteed to be formed by the set of test 
video sequences, the custom VLC table further mirrors the H.263 standard in that a 
default codeword length of 22 bits is used for any event not contained elsewhere in the 


table. 
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A Huffman code offers the following advantages. The average codeword length 


is minimized because the more frequently occurring events are assigned shorter 
codewords (fewer bits) while the less frequently occurring codewords are assigned longer 
codewords (more bits). Additionally, each codeword is uniquely decipherable ensuring 
that no codeword can be a prefix to a longer codeword. : 

This same procedure is used here for both motion video sequences and static slide 
sequences. Again, using different VLC tables for the two basic types of video content is 
advantageous because of the inherent difference between their frequency content and, 


consequently, the different makeup of the RLE codeword population. 
6. Rate-Distortion Relationship 


Compressed video is inherently variable bit rate since compression gain varies 
with scene activity and complexity. However, transmission channels inevitably require a 
constraint on bit rate due to finite channel capacity or QoS guarantees. Most commonly, 
bit rate is constrained to maintain a constant rate or to maintain a constant local-average 
bit rate over time. Many factors affect bit rate, but the most important is the tradeoff 
made between quantizer step size and image fidelity. A larger step size results in a lower 
bit rate and a larger amount of distortion. Reducing the step size increases the bit rate but 
reduces the amount of distortion. Rate control therefore requires evaluation of the rate- 
distortion relationship created by a particular coder design. 

The rate control problem may be posed as a resource allocation problem in terms 
of the rate-distortion relationship, where the goal is to minimize distortion D for a bit rate 
R subject to a bit rate constraint R, [24], ie., min{D}, subject to R<R,. The 
corresponding optimization problem is solved using Lagrangian methods and yields the 
optimal solution for a particular rate constraint as a point along the rate-distortion curve. 
Figure III-18 shows a typical rate-distortion curve and an optimal solution for a bit rate of 
Ry. While the true rate-distortion curve is guaranteed to be convex [17], the operational 
curve is influenced by the coder design including the motion-detection scheme, the 


quantizer design, and lossless coding gains. Therefore, rate control schemes tend to only 
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_ approximate the true rate-distortion relationship when determining a method for varying 


quantizer step size to achieve the desired bit rate. 





Ry R 


Figure 111-18: Rate Distortion Curve with Possible Optimal Solution. 


Additionally, for a layered coder where multiple quantizer parameters are 
employed, the corresponding multi-dimensional aspect of the rate-distortion curve 
complicates the rate-control problem. Assuming distortion for each layer i is additive, 


the rate control problem becomes minimizing 


N-1 
D=>D, | (IIl-7) 
i=0 
subject to 
N-) 
DR SR, (-8) 
i=0 


where N is the total number of layers. The assumption of additive distortion implies that 
a decrease in rate requires a suitable decrease in all quantizer parameters to yield an 
optimal solution. However, since the rate-distortion curves in the operational coder are — 
not necessarily convex, the approach above does not necessarily give-optimal results. An 
alternate, albeit heuristic, approach proposed by Parker [9] is to simplify the control 
problem by creating a simplified, operational rate-distortion curve. 
Considering the test motion video sequences, an operational distortion curve is 

created by first plotting total bit rate and distortion (measured by pSNR) separately 
through a three-dimensional space spanned by the set of candidate quantizers. This 


process captures the operational effect of the coder design, such as the values of the 
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quantizer parameters {qi.q2,q3} and the VLC coding gain as well as any interdependence 
between layers, on the rate-distortion relationship. The result is best described as a four- 
dimensional (4-D) surface wherein both rate and distortion are functions of a triplet of 
quantizer parameters {q1,q2,q3}. Recall that the first parameter represents the JPEG 
_ scaling factor while the remaining parameters represent the actual quantizer step sizes. 
Next, the points representing the pSNR surface are sorted in a descending order 
and associated with their corresponding average bit rates and quantizer triplets. Any 
triplet set yielding a higher average bit rate for the same or lower pSNR is discarded. The 
result is an implicit vector quantization of the operational 3-D rate-distortion space. The 
dimensionality of the operational rate-distortion curve is therefore reduced to 1-D as 
shown in Figure II-19. Each point on the curve represents results from a single optimal 
triplet. Considering only those quantizer triplets associated with average bit rates about 
the target bit rates of 64-96 kbps and considering a 5% change in the average bit rate of 
the coarsest quantizer triplet as a reasonable control step, the operational rate-distortion 
curve of Figure III-19 reduces to that shown in Fi gure III-20. The corresponding 
quantizer triplets are plotted in Figure I-21. These results indicate that an optimal rate 
control scheme does not necessarily increase/decrease each quantizer parameter in 


lockstep as would be expected if distortion in each layer were independent. 
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Figure III-19: Operational Rate Distortion Curve (Motion Video). 
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Figure III-20: Reduced Operational Rate-Distortion Curve (Motion Video). 
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Figure [II-21: Quantizer Table Triplet Values (Motion Video). 


The same approach was followed for static slide sequences. However, a slightly 
different behavior was observed. Instead of the expected continued decrease in MSE as 
bit rate increased with finer quantization, the behavior was as depicted in Figure HI-22. 
Past a limiting value of MSE, finer quantization and the related increase in bit rate 


yielded no increase in image fidelity. After reconstructing the custom VLC table with the 


48 











quantizer triplet at that limiting point ({q1:q2:q3} equal to {4:16:16}) the average bit rate 
became approximately 44 kbps as shown. Since this bit rate is below 64 kbps, no 
additional bit rate control scheme was deemed necessary; all static slide sequences are 


quantized with the same triplet. 





4 5 6 7 8 9 10 
average bpf x 10° 


Figure III-22: Operational Rate-Distortion Curve (Static Slides). 


7. Bit Rate Control 


This approach to the rate-distortion problem provides a potential method for a 
simplified layered rate control scheme since the set of possible quantizer parameters is 
reduced to a far smaller set of parameters. Considering each triplet as an optimal 
quantizer state, any control scheme would manipulate the quantizers for each layer of a 
motion video sequence by selecting only entries from this set via a simple table lookup. 
Parker [9] proposes two such schemes. One functions at the frame level; the other 
operates at the macroblock level. The former was examined and implemented for this 
thesis in the coder. 

Using the operational rate-distortion curve, a linear control curve relating bits per 
frame B to quantizer setting Q is created as shown in Figure III-23. The slope AB/AQ 
represents the average increment or decrement in bits per frame with a step change in the 
quantizer table. Dividing this quantity by the average number of macroblocks selected 


per frame in the test sequences, M , yields the desired control parameter B: 
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poe | (III-9) 


The calculated value of B resulting from the test motion video sequences was —~11.346. 


This control parameter was then used to adjust the coder quantizer setting in accordance 


with the following scheme. - 
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Figure III-23: Operational Rate Control Curve (Motion Video). 


At call setup, the average bit allocation per frame B is set to 


_ R 
B=. (III-10) 
f 


where R,,.- 1S the channel bit rate, and fis the frame rate. For each new frame i, the 


actual bit allocation from the last frame (i-1) is used to estimate the bit allocation error or 
. deviation expected to result from the current frame i if the quantizer setting used in the 


previous frame is not changed. Accounting for the change in the number of macroblocks 


selected between the last and current frames, the deviation expected is: 
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mee | 
AB... = B- re ba - (il-1 1) 
where M; is the number of macroblocks selected for transmission in the current frame, 
M,.; is the number of macroblocks selected for transmission in the previous frame, and 


B,_, is the number of bits used in the transmission of the previous frame. The required 


change in the quantizer setting is calculated using the deviation AB inter, the number of 
macroblocks selected for transmission in the current frame M;, and the control parameter 
B: 
AQ, =| —“* |. | (I-12 
Q; MB | ( ) 
Here, | | is the fixed integer operator that discards the decimal portion of the result. The 


result indicates the quantizer setting from the last frame should be incremented or 
decremented by AQ,. If the quantizer has reached the upper or lower limit of the table, 
the value is not changed. This quantizer triplet selection scheme is accomplished via the 
function, get_qd_entryf.m. | 

This control scheme is only applicable for motion video; a single quantizer 
parameter triplet is used for static slides. The only exception is due to a scene change. 
When a scene change is detected by the coder (as defined in the next section), so much of 
the frame is selected for transmission that a spike in bit rate would occur if quantized 
with any of the triplets available in the control table. To avoid this undesirable spike, the 
first frame of a new scene is heavily compressed. Following this initial frame, the 


appropriate quantization technique ensues. 
8. Scene Change Detection and Scene Type Determination 


Since the bit rate must be suppressed during a scene change and the coder must 
determine which of two possible layering schemes to employ following a scene change, 
these criteria must be defined. The coder concludes that a scene change has occurred if 
the number of macroblocks selected exceeds a threshold. This threshold was determined 


from the block selection statistics of the test video sequence containing the most highly 
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active content (though still considered low-motion video) of all the test video sequences 
examined. For this sequence, the average number of macroblocks selected per frame was 
34.61 and their standard deviation was 10.19. The threshold was set at three standard 
deviations above the mean (65). The comparison operation is performed in the main 
code block, thesis.m. A frame sequence is determined to be static when macroblocks 
chosen for transmission result solely from aging. In this case, the fixed quantizer triplet 
and the static VLC table from Appendix B are used. Otherwise, the sequence is deemed 
motion video, and the bit rate control scheme discussed above is employed with its 
associated custom VLC table from Appendix B. This determination is performed within 
the function, m_blk_id_xr.m- | 


This chapter began with a presentation of the three basic techniques available for 
layered video coding. The approach implemented here, as proposed by Parker [9], is 
frequency-based and utilizes the FHT and the 2-D DCT for motion video sewicnices and 
the FHT alone for static slide sequences. Different frequency transforms are used.due to 
the inherent differences in the perceptual frequency content within the two types of 
sequences. The method of implementing frame refreshment was detailed as a block 
selection scheme applied at the macroblock level that captures perceptual changes due to 
motion within a scene and limits the duration of decoder errors at receivers by forcing 
macroblock updates via an aging algorithm. Quantization and encoding techniques were 
presented with the implementation including the use of the JPEG standard and uniform 
quantization coupled with one of two custom VLC tables. The issue of rate control was 
addressed, and the implementation of a scheme that reduces a 4-D rate control surface to 
a simple quantizer table lookup to contro! bit rate at the frame level was discussed. 
Finally, the manner by which the coder detects a scene change and determines the type of 


sequence under consideration was delineated. 
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IV. RESULTS 


This chapter presents some results from a short video segment consisting of 100 
frames of a single speaker followed by 50 frames of a presentation slide filled with line 
diagrams and text. The Matlab code is contained in Appendix C. A sample frame from 
each sequence is shown in Figure [V-1 and Figure IV-2. Each shows the original frame 
and the reconstructed frame with only the base layer received, with the base layer and the 


first enhancement layer received, and with all layers received. 


Original Frame Layers 1, 2,and3 





_ Figure IV-1: Original and Reconstructed Frames from a Motion Video Sequence. 
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Figure IV-2: Original and reconstructed Frames from a Static Video Sequence. 


Figure IV-3 shows the bit rate traces for the 150-frame layered video segment. 
Part (a) displays the traces resulting from the constant motion video quantizer triplet 
({6,12,20}) expected to yield 80 kbps based on the average bit rate resulting form all test 
motion video sequences and the fixed static sequence quantizer triplet. Part (b) displays 
the traces resulting from the bit rate control scheme for the motion video frames with the 
coder attempting to achieve an average 80 kbps and the fixed static sequence quantizer 
triplet. Bit rate spike suppression is employed for the initial frame of each scene as 
discussed previously. The distribution of bit rate offered by a layered video coder is 
evident with the bit rate ratio among layers being approximately 5:3:2 for both sequences. 


As congestion occurs in the network, the higher layers can be dropped to combat the 





congestion while maintaining much of the quality as illustrated in Figure IV-1 and Figure 


IV-2. Neglecting the initial frame, the average bits per frame for the motion video 








sequence without the control scheme is 7454 bpf with a standard deviation of 1362 bpf. 
With the control scheme, the average and standard deviation are 7988 bpf and 942 bpf, 
respectively. As expected, the bit rate from the static sequence is much lower since the 


bit rate results solely form macroblock aging. This illustrates that rate control is not of 


significant benefit for static sequences. 
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Figure IV-3: Bit Rates for (a) Fixed Quantization and (b) Bit Rate Control. 


Figure IV-4 quantifies the progressive improvement in quality of the 
reconstructed video based on pSNR for the same 150-frame layered video segment 
illustrated in Figure IV-1 and Figure IV-2. At the beginning of each sequence, quality 
ramps up over the aging interval following a scene change. After this period, quality is 
observed to remain relatively flat for each sequence regardless of the number of layers. 
For the motion video sequence, the base layer provides a smoothed but acceptable 
display. Text is not readable but the speaker’s movements are easy to follow. Adding 


the first enhancement layer improves sharpness and adds a 4 dB improvement in pSNR 
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although small text is still difficult to discern. The second enhancement layer only adds — 
1-2 dB improvement, but small text is clearly readable and other features with fine edges 
are sharper. With static video, the role of the enhancement layers is even more dramatic. 
Even though most of the macroblock’s energy is included in the base layer and 
contributions from each frequency band are included, the base layer still shows much 
softness although the shapes are readily identifiable. The first enhancement layer adds a 
7 dB improvement and dramatically improves sharpness. The final layer, even though 
the bit rate contribution is the smallest of the three layers, almost doubles the pSNR, and 
the reconstructed frame is virtually identical to the original frame. Neglecting the initial 
frame, the average pSNR utilizing all layers for the motion video sequence without the 
control scheme is 29.5 dB with a standard deviation of 1.7 dB. With the control scheme, 
the average and standard deviation are 29.8 dB and 1.9 dB, respectively. Note that these 
values include the ramp up in quality following the initial scene change. These average 
pSNR values are not directly comparable, however, because of the difference in average 
bit rates; the higher quality, rate control approach uses approximately 500 more bits per 
frame. But since this quality is achieved within the desired bit budget of 80 kbps, the rate 
controller allows more effective utilization of the available bandwidth to better image 
fidelity. The statistics obtained from the motion video sequence traces in Figure [V-3 


and Figure IV-4 are summarized in Table IV-1. 
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Figure IV-4: pSNR for (a) Fixed Quantization and (b) Bit Rate Control. 
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Bit Rate STD (bpf) 942 
Mean pSNR (dB) 29.83 © | 
pSNR STD (dB) 1,92 1.74 


Table IV-1: Rate Controlled and Uncontrolled Motion Video Sequence Statistics. 


Additionally, the issue of the layered video codec’s resilience to bit errors 
introduced during transmission was examined using the same motion video sequence as 
above. Using loss rates of 10%, 25%, and 50%, four different case were tested. Case one 
distributed the bit errors across the layers in proportion to their contribution to the total 
bit rate and utilized zero-order error concealment; the reconstructed frame retained the 


content of the previous frame for that portion lost during transmission. If the loss 
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occurred in the base layer, the enhancement layers were neglected. If the loss occurred in 
an enhancement layer, the reconstruction was performed utilizing the base layer and the 
other enhancement layer. Case two treated all layers as a single video stream and utilized 
zero-order error concealment; that is, a transmission loss was a loss for all layers. Cases 
three and four are identical to cases one and two, respectively, except that no error 
concealment was used. Instead an information loss caused the decoder to assign the 
value of zero to all coefficients in the affected layers. As Figure IV-5 illustrates, 


spreading bit errors across multiple layers has less negative impact on the reconstructed 


image at high loss rates. 
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Figure [V-5: Comparison of Error Resilience. 


Considering static slide sequences, the relationship is more complicated. In 





general, since the frame content is entirely static, the best reconstruction in an error prone 


environment is to forgo a macroblock update if the update would be made with fewer 














layers than the present reconstruction of that macroblock. Further study into the 


implementation of such a scheme is warranted. 


This chapter presented representative frames from a video segment consisting of 
motion video and static slides. For each type of content, the original frame and the | 
reconstructions with one, two, and three layers were given. Also presented were plots of . 
the bits per frame and pSNR as a function of the frame sequence for both fixed 
quantization and variable quantization using the bit rate control scheme. These plots 
served to quantify the quality depicted in the reconstructions and to illustrate the bit 
allocation among the layers. Finally, the effect of spreading bit errors across layers 


compared to confining them to a single video stream was presented. 
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V. CONCLUDING REMARKS 


A. CONCLUSIONS 


This thesis has presented the development and implementation of a new layered 
video codec proposed by Parker [9] that emphasizes robust transmission of a video 
teleconference (VTC) at low bit rates. The dual nature of the targeted scene content — 
low-motion video and static slide sequences consisting of text and line drawings — and 
the assumed application in a multicast, heterogeneous, wireless network environment 
required significant flexibility in the implementation. The essential features of the coder 
are summarized as follows. | 

Frame refreshment is accomplished via block updating and an aging algorithm, 
both applied at the macroblock level. This approach promotes greater robustness because 
spatial error propagation is eliminated and temporal error propagation is greatly limited. 
The combined technique captures perceptual changes due to motion within the scene, 
limits the duration of error artifacts in the reconstruction at receivers, and ensures that 
new participants in a VTC session that is already in progress receive a complete frame in 
a timely manner. 

The macroblocks selected for transmission are decomposed in frequency using the 
fast Haar transform (FHT). For motion video sequences, the lowpass subband is further 
processed with the two-dimensional discreet cosine tranform. The horizontal and vertical 
edge detail subbands are further decomposed with a repeated application of the FHT. 
Static slide sequences are decomposed solely by a second order FHT analysis. | 

The lowpass subband of motion video is quantized and encoded using the JPEG 
standard in order to exploit the human visual system perceptive characteristics. The 
remaining subbands of motion video sequences are subjected to uniform quantization and 
encoding with a custom variable length coding (VLC) table. All subbands of static slides 
are subjected to uniform quantization and encoding with a separate custom VLC table. 
All quantization is performed using a triplet of quantizers, and each subband is quantized 


with one of the triplet parameters based on the variance of subband coefficients. 
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Subbands are assigned to layers by grouping bands of coefficients with similar 
variances into a common layer. Three layers are used in the coder. The base layer is 
independently decodable and yields an acceptable, minimum-quality reconstruction; each 
enhancement layer progressively improves the quality of the reconstruction. 

Bit rate is controlled at the frame level by selecting the quantizer triplet to be used 
in the current frame based on the number of bits used in the previous frame, the desired | 
average bit rate, and the number of macroblocks selected for transmission. The 
implementation involves a simple table lookup, which resulted from the optimal one- 


dimensional reduction of a four-dimensional control surface. 
iB. FUTURE WORK 


The implementation and the results presented here suggest that the layered video 
codec has potential practical utility to video teleconferencing in multicast, heterogeneous, 
wireless networks. Now, several aspects of the coder can be pursued further. For 
example, although three layers were used in the present implementation, the techniques 
employed can be used to scale the coder to include an arbitrary number of layers. Work 
on techniques to dynamically change the layering scheme within a sequence is desirable. 
The ability to handle color and audio needs to be incorporated into the code. Further 
refinement of the block search pattern utilized for macroblock selection and the 
possibility of rate control at the macroblock level can be evaluated. Also, with regard to 
static slide sequences, investigation of the ability to detect and reconstruct slight 
movement within such a frame, such as the movement of a cursor, is warranted. Finally, 


implementation in a high level language or hardware is another potential future task. 
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APPENDIX A. MOTION VIDEO VARIABLE LENGTH CODING TABLE (VLC) 


The following is the custom VLC table used with motion video sequences. The 


last character in the codeword s indicates the appended sign bit. 


INDEX 


LAST 


RUN 


woh ah exh ook 
oooo 


DONDODDDANNNNDDOOGAAAAARAHAWWHOWDNNNNA SABAH HK BOOOO090 


LEVEL 


MIWONM$ AN AWANHFAPAN#X HOAONHKXAPANHXAWNH# A ANHXAANHOTAANH OAH AHN — 
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BITS 
3 


CODEWORD 
11s 
0101s 
10010s 
1010001T1s 
10111001101s 
1011111001110s 
0100s 
100111s 
0010001015 
10111110000s 
1011010010000111s 
1011010010100s 
0000s | 
1011001s° 
01110001010s 
001000110110s 
01111Ss 
01100100s 
10111001015 
101101011001s 
O0011s 
01110010s 
101101010s 
00101101001111s 
10010s 
0001011015 
101111100015 
1011010010001115s 
011101s 
000101110s 
0111000100000s 
101101001000110s 
011011s 
001000100s 
1011111001111s 
101101001000001Ss 
011000s 
00100100s 
0010110115 
01110007T0001s 
0010171115 


0010011010s 
1001101011000s 
1011011s 


1001101001s 
~0010011011010s 


101101001000000s 














=—NMNI!§§N=ANHHI?NHNAHNAHANANAN AN AN HN HANAHAN H NH HNH HH NMI NAN AHA ANHWANH BWNH$K WON 


00101100s 
00100011100s 
1011010010101s 
0010000s 
01110001is 
101111100100s 
0011101s 


0001011 
10111000s 
10111111111s 
00101101001100s 
00100101s 


001000110111s 
1011010010011010s 
011001010s 
1011010001011s 
101000100s 
10011011110010s 
0001011110s 
0110010110011111s 
1010001011s 
1010001010s 
1011010011s 
0110010110011110s 
01100101101s 
01110001011s 


101101001011101s 


000101111115 
0010110100110is 
0010001100s 
101101001011100s 
1011010010000100s 
100115 
010111115 
1001s 
000101000s 


110s 
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0111001110s 
01101000115 
011117111001s 
01011010010100s 
01110011110s 
011010010011101s 
111000100100s 
11001011000s 
011010010011100s 
01110010001s 
0110101101s 
1110001001115 
01101001011110s 
00110111011s 
0011010000s 
0010110010000s 
11001011111s 
011010000115 
011100110010s 
111000100110s 
001101011100s 
01101001011001s 
011111001100s 
01011010010111s 
0001011001101s 
101111111000s 
000101100100115s 
0001011001100s 
001011010010001s 
100110111100T1s 
011001011001110s 
000101100111000s 
0001011001110010s 
00101101000000Ss 
O011s 
011010s 
1011000s 


0010011000s 
00101101001001s 
10001s 

O1s 

10101s 
0001011s 
010000110s 
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 oOom_m_on ood — — 
— = © 


ooo0oo0oo0o0oeoco 

Owt st Oost a wa 
O-042- 00428 42 4 4 

On OAR 4H OB 2 2B OO = 
stomata ot OM = ©O 2B © mm oS ot 


a~O=300 
20000 
at eweh ooh ach awh 
4=000 
oo 
a © — 
Qo°o 
on” 
r=) 
oO 
7] 


Ooo 
om oh anh 
oo 
Ot 
o=4” 
© 
© 
> 
” 








160 
161 
162 


OODODWMDONNNOOMUHN 


66 


1011110s 
00010110010001s 
00010110010010s 


00010015 
100110101101s 
0110011s 
100110111115 
1001101011001s 
0001010s 
101110010010s 
101101001011011s 
01110011s 
0010110100001i1s 


1011010010011000s 


-1001100s 


00101101000010s 
1010000s 
1001101011110s 


~1011010010011011s 


1011101s 7 
001011010001s 
101101001011010s 
001001115 
0111000100001s 
01110000s 
00010110011100111s 
1011010010000101s 
0001000s 


011001011110s 
00010111110115s 
00101101001110s 
1011010111415. 
001011010010101s 
011001011100s 
0001011111010s 
00100011010is 
001001101100s 
00010110011100110s 
1011111111011s 
0010001111118 
00010110001s 
10110100001001s 


0110010110010s 
010000107001s 
1011000s 
0111011s 
0110011s 
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0111000100101s 
0010001110101s 
0010011011011s 
1011100111115 
001001100101s 
0010001110100s 
100110100010s 
1001101111000s 


0010001110111s 
00010110010101s 
10110100000170s 
101110010000s 


101111100101s 
001000110100s 
1011010010011111s 
1001101011101s 
00101101000001s 
1011111001101s 
0010110010100s 
00110111010s 
011010001001s 3 
011010010011110s 
01111171011s 
011010010011001s 
0111111101s 
01011010010110s 
01110010011s 
0001011001111s 
00010110010111s 
00010110010110s 
1011010001000s 
1001101011111s 
0010001110110s 
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APPENDIX B. STATIC SLIDE VARIABLE LENGTH CODING TABLE (VLC) 


The following is the custom VLC table used with static slide sequences. The last 


character in the codeword s indicates the appended sign bit. 


INDEX 


LAST 


RUN 


LEVEL 


ev) 
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69 


BITS 


Oo 
aul, 


CODEWORD 


nn 


oooor-4H+00+"0" 0 
wk eh eh OO ek tk ks ak oh ah OO 


OO 
=— © 
-— © 


01100115 
0010701s 
00100010s 
010s 
11100s 


Oo 
— © 
-— © 
mk ok 
oO 
” 


ooo — 
Ooo — 
—h ok ok oh 


‘ak 
© 


Oar eon st OOK ae ae eo sO 
“~Oxea st eet et KB OBO HBOH HOBO HW +O 


OoOaena BB Oenwann BOO HAA aA Ks On am 
“=~OOoOenAa soars O O-0 0=]00204—4 —- 
QOQooreae# Ht OODOHt eB et KAO D Bee OO = ot 


oooexAe | OO +_0000+0 
Oona eB est OOddod_OA 4s a 

















4) 

= ”) ” ” 

© On = w ~” ~” or ™ W ow w ” 

© ors: © = ™- or. oO © or ™ 5 

“-“VONQOrrn © ™ 73] ™ rTroort 5 a © wo Orr 

rer OO © oO Ww © ™ Om ooors: 2] Oo © © On cr OT 
“” rr rr re 2) © ™m™ OO” © rOorr arooerr oo or OAOanrrgaros 
Onjnajrrrrfo n= ™ W NDQOoror OrOorer %®HrOOOre OrOonor: oOororoerGtr ©} 
OCOorroeocdo0°o OrOnoOonroocoocoecewunarrcocococcoccocoororonwnororoownunrrrrrerrrO 
OorOonrroOorgnooororceorecoceoereoreoecerecnecoedeorrr rOoOorresrrrOdOOororrcoroorr 
rorocorrorr Orrrr OOrrorrrrrrrrrvr OOoroocooororrorrrrooooorcocoocoo°o 
rrrrorrrOoOrr Or Oorrrorroroocereoeorroorrrrr OrorrOrrroorrrroorr 

re 


or an 
oooooeorrrrreooorrrrodOrrrooreorrroeoreororocoocooeceeeorrrrcocoorrc0co0o0ced 
oeooo0oorrreooceeorce roo or OrrrOorrrorroroo°ocecrcecoTrcorTrorrrrodOoOrroaooo°dceo 

oOOonm~anwMwM © © ln) © ON un) NWO © In OM © lH © NOW W LO oN 1 


MOMOMNMNMNNTYT YT TTTTONNNNNNOOOWDOWORRRARARRRODDDDDDDAAMAADALCOOSCS 


SDOTDKDKODDODDODDCOOCOCOODODOODOOOCOOC OOOO OO OOO COCO OOOO OOO COOCOOCO 





70 


“” ” 

62] aon wn ™ Y= DW 

™ W orr w” © or Oo Ww 
noo.,- -™O © = nOOOAO ar-oo- 
rrr Orron"r er QOQd0OHnoO0Orr 
Orrorororoeoceceorrrrrt rd eo 
SOrrrrrrr rf OO rr orro rrr © 
rOrrr Oreoooeooeoererrrrrod Oo 


coorrcoorqococeocerrrrrrorrtr 
rreroororoOroocorc0o°c”coor]e 


rrorrrrrerrrOorrorrrrrrr 
rrorrroror cd oor ooolUuwmwlmlcUOUlLTr SES 
rrereroOrrorcoorcoreoqoceoreor 


” 

_ 
bo ie] ” 
” ” 2) rm © wow © 
noon Ww Sd “” own wor -roOor oO 
nOaoroodr 07O nono rorr Oorror 
ooreore°or = rm ooor norodo 
Orooorrraoeoro nooOooeoor or OoOo°o0 Oo 
cvrOro cerrrdrtr roorrgagoarooor]o 
ooo0orrororocjtlmuwmuanhOoaorrr rororrrr O 
Orrroorororrocooorocooc””oooorooor 


Orrr@Qgqgoeoqoo0cod rrrrrdtdt O00 © 
Orrr Ooorooeoeoreoerreooo eo 


qQeqeeqoeoecoeooo0ocoococooooo0o0o co rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrerrTr 





132 
133 
134 
135 
136 
137 
138 
139 
140 
141 
142 
143 
144 
145 
146 
147 
148 
149 
150 
151 
152 
153 
154 
155 
156 
157 
158 
159 








71 











MOOMOMNMOMONTTITITNNONODOOWOORKRRDDDDADAAMDOOT RT TEETANANNANANANNDMOYY TON 











APPENDIX C. MATLAB CODE LIBRARY 


This appendix contains the Matlab code used in the layered video codec. The 
main code block, thesis.m, is provided first, and the supporting functions follow in 
alphabetical order. As provided, thesis.m processes four video sequences: two motion 
_ video sequences of 100 frames each, followed by two, static slide sequences of 50 frames 


each. 


% thesis 


format compact 
clear all 
close all 


global QD_TABLE VLC_DYN VLC_STA RV HUFF_TABLE LAST 
load('g:\QD_TABLE’ ) 


load('g:\V1C_dyn') 
load('g:\VLC_sta’) 


VLC_DYN = VLC_dyn; clear VLC_dyn % gets custom VLC table for motion video 
VLC_STA = VLC_sta; clear VLC_sta % gets custom VLC table for static 
slides 


initialize (zero means dynamic scene) 
initialize (track macroblocks 


frame type = 0; 
m_mat_ndx = []; 
selected) 

f vec = zeros(1584,16); 


rand('state',0) % seed for reproductivity 

RV = floor (21*rand(99,1)); ¢ for aging algorithm 

HUFF TABLE = make_HAC table; %$ gets JPEG standard table 

LAST = ones (99,1); % sets block selection to 1 initially 

threshold = 160; % for use with asd. 

count = -1l; % where is it in the loop? 

mse = 1; % calculate MSB? 

display = 1; % show the images? 

write = 1; % write to file for evaluation? 

last = 299; % last frame # to consider 

toff = 0; % intialize (used with scene change) 
% 
% 


ade 


initialize (reshaped frame) 


f far_ll = double(zeros(792,8)); initialize for decoder 
f far lh = double(zeros(792,8)); % initialize for decoder 
f far_hl = double(zeros(792,8)); %$ initialize for decoder 
f far_hh = double(zeros(792,8)); % initialize for decoder 
f far_ll_pl = double (zeros (792,8)); % initialize for decoder 
f far_lh_pl = double(zeros(792,8)); % initialize for decoder 
f far_hl_ pl = double(zeros(792,8)); $ initialize for decoder 
f far_hh_pi = double(zeros(792,8)); % initialize for decoder 
f far_1l1 p2 = double(zeros(792,8)); % initialize for decoder 
f far_lh_p2 = double (zeros (792,8)); % initialize for decoder 
f far_hl_p2 = double(zeros(792,8)); % initialize for decoder 
f far hh_p2 = double(zeros(792,8)); % initialize for decoder 
f 1 = double (zeros (1584,16)); % initialize for decoder 
f 2 = double (zeros (1584,16)); % initialize for decoder 
f 3 = double(zeros (1584,16)); % initialize for decoder 
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beta = -11.346; % slope of rate control curve 


br = input(‘Enter the target bitrate (Kbps) \n >> ’) 


B_bar = br * 1024/10; f 
qd_entry = min(find(B_bar > QD _TABLE(:,4))); % intial triplet based on test averages 
for i = O:last . % loop through frames 


count = count+1 


if (1i<=99) 
i_present = get_next_image (i);  % read head#.bmp 
elseif ( (i>=100) & (i<=199) ) | 
i_present = get_next_image_2(i-100); % read ncaa#.bmp 
elseif (i==200) 
i present = get_next_image(101); % read busy text slide 
elseif (i==250) 
ipresent = get_next_image(102); % read line drawing slide 
else 
i_present; 
end 
if display % diplays original if desired 
figure; 
subplot (2,2,1) 
image (i_present) 


colormap (gray (256) ) 
title([‘’Original’]) 
axis off 

end 


STTTECEEEEEEEEETESEETESEEEEEEEEEEEEEESEELEETEEEEEESTEEEEEELEEEESEEETEEEESSESEEESEEBES 
EEEETETETEEEEETEEETESESTEETEELESETEEELESES Coder FSEEEESSSEEELELELEESSESTSEESEEESEESEEESEES 
TEETEEETESTEEEEEEEEETELEEELLESEEEELEEEELESEEEEEEEELEEEEEEELEELEEETSELEEEEEEEESEKSELSES 


SESEEEEEEE Change dimensions of image and identify macroblocks by threshold %%%%%%%% 


f_last = f_vec; % buffers previous frame 

f_vec = shape(i_present) ; % shapes current frame 

[m_vec_ndx, frame_type] = m_blk_id_xr(f_last,f_vec,threshold); % compare current to 
last 


mmat_ndx = [m_mat_ndx,m_vec_ndx]; % matrix of MB’s selected - 
if (sum(m_vec_ndx) >= 65) % triggers a scene change 
tofft = 1; % "trigger" flag set "on" 
end 
if (toff) % scene change just occured 
tofft = 0; % resets "trigger" flag 
flag = 1; % "flag" for lst frame after scene change 
qi = 64; % heavily compressed scene change 
q2 = 1000; 
q3 = 1000; 
elseif frame_type 
qi = 4; % triplet for static sequence 
q2 = 16; 
q3 = 16; | | 
else % a dynamic frame sequence 


delta_Binter = B_bar - (sum(m_vec_ndx) /sum(m_mat_ndx(:,i))) * total(i); 

qd_entry = get_qd_entryf(flag,B_bar,qd_entry,delta_Binter, sum(m_vec_ndx) , beta); . 

flag = 0; % resets flag 

ql = QD_TABLE(qd_entry,1); % rate control triplets fetched 

q2 = QD_TABLE(qd_entry,2); 
= 3 


q3 QD_TABLE(qd_entry, 3 
end 
Q = get_Ql_matrix(ql1); % quantizer matrix (via JPEG standard) 
if ~frame_type % if dynamic frame sequence 
714 











£_fht_hi_hh gq 





TESEEEEEETETSESEESSSSSSESEEEEESEE Transforms SESESEEEEEETEEESETELEESEEEEEESETEEELF EE 
{f_fht_11,f_fht_lh,f_fht_hl,f_fht_hh ] = fht (f£_vec,m_vec_ndx, 16); % FHT of frame 


f_fht_ll - 128; % level shift of LL 
dct_of_fht (f_fht_l1,m_vec_ndx) ; % 2_D DCT of LL 


f_fht_il 
F_fht_il 


[£.fht_lh_11,£_fht_1h_lh, f_fht_lh_hl, f_fht_lh_hh] : 
fht (£_fht_lh,m_vec_ndx, 8); % subband LH FHT 


{f_fht_hl_11,f_fht_hl_lh, f_fht_hi_hl, f_fht_hi_hh]} , 
fht (£_fht_h1,m_vec_ndx, 8) ; % subband HL FHT 


SELTSEESSELEEEESESEETEEETSESETEEVSSEE Quantizing erercerrrrrrrrrrrrrrrrrrrrrrrrrrrtt 
f fht_ll_q = quantizer_11(f_fht_1l1,Q,m_vec_ndx) ; % quantize LL 

f fht_hh_q = round(f_fht_hh/q3); % quantize HH 
round(£_£ht.1h_11/q2); % quantize of LH subbands 
round (f_fht_lh_lh/q2); 


round (£_fht_lh_hl/q3); 
round (f_ fht_lh_hh/q3); 


£_ fht_lh_ll_gq 
£_ fht_lh_lhgq 
£_fht_lh_hl_g 
f fht_lh_hhq 


f_fht_hl_li_gq 
f£_fht_hl_lh_q 
f_fht_hl_hl_q 


round (f£_fht_hl_11/q2); % quantize of HL subbands 
round(f£_fht_hl_lh/q3); . | 

round (f_fht_hl_hl/q2); 

round (f£_fht_hl_hh/q3) ; 


ELTESESSSESSSSSSSEBESESETELEEEETESEEEEEEEEEELELEEEEEEEELELELELELETEEESEEEEEETTEETS 
SEYTSSSSSESSSSSEBEEELESEEEEEEEE Working on LL S¥EEEEEESEEELELEETETESESESEEELETTTETS 
TELTESESEEESESESSEEESEEEEEEEEEEEEEEESESESSESEEEEEEETEELEEEEEELELEEEEETEEETEEEEEEEETETS 


% zig-zag scans each LL 8x8 (Results are one 6336x1 vector [ll_zz] and the index 
: of the last non-zero entity [last_ll_zz]. Get "inf" if a group of 
% 64 is all zeros.) . 

flast_11_2zz,ll_2zz] = zzb(f£_fht_ll_q, 8); 


% gets rid of trailing zeros (one big vector of varying size.) 
a_ll.zz = make_it_compact (11_zz,last_li_zz,8); 


% parsing LL with Huffman routine . 
parsed_1l_zz = parse_Huff(a_11l_zz,last_1ll_zz); 


% gets bits per frame due to LL 
bits_1ll zz(i+l1) = get_bits_Huff(parsed_11_2z); 


TESEEEEFELSEESELEESEESEEEEEEEEESETSEEESEEEELEETEEEEEEEESETESEETELEELETESELELSTETETEEEES 
TGESSESEEEEEESSEEESEEESEESESEEEEESE Working on HH SETESEEEEEEETTEESEEETEETESESSSEESSLEEESS 
TEESLEEEEEEESSSETLSSEEEESEESEEETSESEEEEEELEEEEEEEEEELEEEEETEEEEEEEEEELEEETEEEEEEEESELETS 


% scans each HH 8x8 (Results are one 6336x1 vector [hh_r] and the index of the last 
% non-zero entity [last _hh_r], where r indicates horizontal raster 

% the scan method. Get "inf" if a group of 64 is all zeros.) 

flast_hh r,hh_r] = raster(f_fht_hh_q, 8); 


% gets rid of trailing zeros (one big vector of varying size) 
a_hh_r = make_it_compact (hh_r,last_hh_r, 8); 


% parsing HH and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_hh_r_3D = parse_3D(a_hh_r,last_hh_r); 

1_p_hh yr 3D(i+1) = length(parsed_hh_r_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 


% {includes 99 COD bits) 
[bits _hh r 3D(i+1), hh_r_22(i+1)] = get_bits2 (parsed_hh_r_3D, frame_type) ; 


SEETSESSESELEESEEESESELEETELEETEETEEEEEEEEEEEEEESEEEEEELEELEETETELETELETETELESEEESTES 
ELSEETLESEEEEESEEEEESELESEESEES Working on LH subbands EMELELELESESESEESEEEETSEESEEELEES 
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SESSTTTSESETETFSSSETESESEESSSSBELLSSTTEEESEEELSEELTSSSELSSFELELSELESLSSEESSLSSESSSSE 
SELSEEEETTEELEETLEEEEEEEEEEES DOing the LH_LL subband t&sFSEEESEEELETSEESTESETEEEEESEESS 


% scans each LH_LL 4x4 (Results are one 1584x1 vector [lh_1ll_r] and the index of 


% the last non-zero entity [last_lh_ll_r], where the r indicates 
% horizontal raster scan method. Get "inf" if a group of 16 is ail 
% zeros.) 


[last_ih_ll_r,lhli_r] = raster(f_fht_lh_1ll_q,4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_lh_ li_r = make_it_compact(lh_ll_r;last_lh_1l_r,4); 


% parsing LH_LL and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_lh_1l_r_3D = parse_3D(a_lh_li_r,last_lh_ll_r); 

l_p_lh_il_r_3D(i+1) = length(parsed_lh_ll_r_ 3D(:,1)); 

% gets bits per frame and number of times the default is chosen 

% (includes 99 COD bits) 

fbits_lh_li_r_3D(i+l1), lh_il_r_22(it+1)} = get_bits2 (parsed_lh_1l_r_3D, frame_type) ; 


TELLEELLBSSTRSEVSELSESSEGRE Doing the LH_LH subband téEt&FFSFFSEEEETEEETEEETEEEEES 


% scans each LH_LH 4x4 (Results are one 1584x1 vector [lh lh 4} and the index of 


% the last non-zero entity [last_ih_lh_r], where the r indicates 
% horizontal raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


[last_lh_lh_ry,lh_lh_r] = raster(f_fht_lh_lh_q,4); 


% gets rid of trailing zeros (one ‘big vector of varying size) 
a_lh_ lh r = make_it_compact(lh_lh_r,last_lh_lh_r,4); 


% parsing LH_LH and getting the number of parsings for later used in eval_thesis to 
get the percent of time that the default bit number is used. 

parsed_lh_ lh r_3D = parse_3D(a_lh_lh_r,last_ih_lh_r); 

l_p_ih_lh_ r 3D(i+1) = length(parsed_lh_lh_ r_ 3D(:,1)); 


% gets bits per frame and number of times the default is chosen 

% (includes 99 COD bits) | “ 

fbits_lh_lh_r_3D(i+1), Ih_lh_r_22(i+1)] = get_bits2(parsed_lh_lh_r_3D, frame_type) ; 
ESELESESESELESSESEEEEEEEEEES DOing the LH_HL subband EESEEEESESEEEESELESESEEEEEEESEESEES 


% scans each LH_HL 4x4 (Resuits are 1584x1l vector (lh_hl x] and the index of 


% the last non-zero entity [last_lh_hl_r], where the r indicates 
% horizontal raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


[last_lh_hli_r,ih_hl_r) = raster(f_fht_lh_hl.g,4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_lh_hl_r = make_it_compact(lh_hl_ r,last_lh_hi_r,4); 


% parsing LH_HL and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_lh_hl_r 3D = parse_3D(a_lh_hl_r,last_lh_hl_r); 

lpolhhi_ur 3D(i+1) = length(parsed_lh_hl r_ 3D(:,1)); 


% gets bits per frame and number of times the default is chosen 

% {includes 99 COD bits) 

[bits_lh_hi_r_3D(i+tl), lh_hl_r_22(i+1)] = get_bits2(parsed_lh_hl_r_3D, frame_type) ; 
ELTTETEEEETETTEEEEEEEETEEETS Doing the LH_HH subband SEEEETTETETEEEEELELELEEEETEETES 


% scans each LH_HH 4x4 (Results are 1584x1 vector [lh_hh_r] and the index of 


% the last non-zero entity [last_lh_hh_r], where the r indicates 
% horizontal raster scan method. Get “inf" if a group of 16 is all 
% zeros.) 


[last_lh_hh_r,lh_hh_r] = raster(f_fht_lh_hh_a, 4); 


% gets rid of trailing zeros (one big vector of varying size) 
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a_lh_ hh yr = make_it_compact (lh_hh_r,last_lh_hh_r,4); 


- % parsing LH_HH and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_lh_hh_ r_3D = parse_3D(a_ih_hh_r, last_ih_hh_r); 

l1_p_lh bh y_ 3D(i+1) = length(parsed_lh_hh_r_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 
% (includes 99 COD bits) 
[bits_1h hh yr 3D(it+t1), lh_hh_r_22(it+l)] = get_bits2 (parsed_lh_hh_r_3D, frame_type) ; 


s 


EXEEEEEEEESEEEESESESESETESEEELEEESSSESTFEEEEETEEEEEESEEEEEEEEETELEEETETESEREEEEETTSE TEES 
EEEEEESESSEEEEESESSSEESESSSSSE Working On HL subbands FEEEETEEEEETEEETTSESSEEEESEEE TESS 
EEEELETEFEFTEEEEESETEEEESESESEFEEESEFELELEBEEEESEETETELEEEEEELETTEEEEEETEEEEEETTEEEE TEES 
SEEESSEEFESSESEEEESSSEEESEEBESE DOing the HL_LL subband SELEEEEEESESSEEEEEESEEESESEESETES 


$ scans each HL_LL 4x4 (Results are a 1584x1 vector [hl_ll_v] and the index of 


% the last non-zero entity [last_hl_1ll_v], where the v indicates 
% vertical raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


(last_hl_11_v,hi_ll_v] = vertical (f_fht_hl_ll_q,4); 


% gets rid of trailing zeros (one big vector of varying size) 
a hl_1l_v = make_it_compact (hl_ll_v, last_hl_ll_v,4); 


% parsing HL_LL and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_hl1_11_v_3D = parse_3D(a_hl_1ll_v,last_hl_li_v); 

1_p_hi_1ll_v_3D(i+1) = length(parsed_hi_il_v_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 

% (includes 99 COD bits) 

(bits_hl_11 v_3D(it1), hl_ll_v_22(i+1)] = get_bits2 (parsed_hl_1ll_v_3D, frame_type) ; 
SELELEEEESSERESTSEESSSSBSBEE Doing the HL_LH subband SEStEEESETEETEEEEETEELEEETELETES 


% scans each HL_LH 4x4 (Results are a 1584x1 vector [hl_lh.v] and the index of 


% the last non-zero entity [last_hl_lh_v], where the v indicates 
% vertical raster scan method. Get "inf" if a group of 16 is all 
% zeros. } 


[last_hl_ih_v,hl_lh_v] = vertical (f_fht_hl_lh_q,4);__ 


% gets rid of trailing zeros (one big vector of varying size) 
a_hl_lh_v = make_it_compact (hl_lh_v, last_hl_lh_v, 4); 


% parsing HL_LH and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_hl_lh_v_3D = parse_3D(a_hl_ih_v, last_hl_lh_v); 

1_p_hi_lh v_3D(it+1l) = length(parsed_hl_ilh_v_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 

% (includes 99 COD bits) 

(bits_hl_ lh v_3D(i+1), hl_lh_v_22(i+1)] = get_bits2 (parsed_hl_ih_v_3D, frame_type) ; 
TELEEEEESEEEESEEEEEEEEEEEEEEES DOing the HL_HL subband ESSEEESEEEEEEEEEEEESEEEEEEEEEEES 


% scans each HL_HL 4x4 (Results are a 1584x1 vector [hl_hl_v]) and the index of 


% the last non-zero entity [last_hl_hl_v], where the v indicates 
% vertical raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


flast_hl_hl_v,hi_hl_v) = vertical (f_fht_hi_hl_q,4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_hl_hi_v = make_it_compact (hl_hi_v,last_hl_hli_v,4); 


% parsing HL_HL and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_hl_hl_v_3D = parse_3D(a_hl_hl_v, last_hi_hl_v); 

lp hl hl _v_3D(i+1) = length(parsed_hl_hl_v_3D(:,1)); 
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% gets bits per frame and number of times the default is chosen 

% (includes 99 COD bits) 

[bits_hl_hi_v_3D(i+1), hl_hl_v_22(i+1)] = get_bits2 (parsed_hl_hl_v_3D, frame_type) ; 
ESETETESTETEEETEEEEEEEEEEEE Doing the HL_HH subband Fs6EETETEEEESELEEEETETEELESTEEEG 


% scans each HL_HL 4x4 (Results are a 1584x1 vector [hl_hh_v] and the index of 


So the last non-zero entity [last_hl_hh_v], where the v indicates 
: vertical raster scan method. Get “inf" if a group of 16 is ail 
zeros.) 


ree: hl_hh_v,hl_hhv] = vertical(f_fht_hl hh_a,4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_hi_ hh v = make_it_compact (hl_hh_ v,last_hl_hh v,4); 


% parsing HL_HH and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is meee 
parsed_hl_hh_ v_3D = parse_3D(a_hl_hh_v,last_hl_hh_v) ; 

l_p_hl hh v_3D(itl) = length(parsed_hl_hh_ v_3D(:,1)); 


*%* gets bits per frame and number of times the default is chosen 
% (includes 99 COD bits) 
[bits_hl_hh_v_3D(i+1), hl_hh_v_22(i+1)] = get_bits2(parsed_hl_hh_v_3D, frame_type) ; 


* getting bits per frame per layer 

layeri(i+1) = bits_11_22(i+1); 

layer2(i+1) = bits_lh_ll_r_ 3D(i+1) + bits_lh_ih_r_3D(i+1) + ... 
bits_hl_li_v_3D(i+i) + bits_hi_hl_v_3D(i+1); 

layer3(i+1) = bits_hh_r_3D(i+t+l) + bits_lh_hl_r_3D(i+1) + bits_lh hh r 3D(i+1) + ... 
bits_hl_lh v_3D(i+1) + lp _hl_hh v_3D(i+1); 

total(i+1) = layerl(i+1) + layer2(i+1) + layer3(i+1); 


SLLELTELSEELLELELSLLSSERLSLEESTELGELESELTLELTLTESLELESEEEEEEELESESSSSTSELESESESESES 


% % 
% Channel % 


SEEEEEEETESEEESSEEEETETEETESEEEEELEEEEESESEEEETEEEEEEEEESERLEEESEELESSEELESEEELEEESEEEEEES 


SEEEELEETESEEETEEEELEETEEEEETEBEES Unquantize EESEFEEEEEEEEEEETSEEETEEEEEELEEEEELEBS 
f_ fht_li_ug = unquantize_11(f_fht_ll_ig,Q,m_vec_ndx); % unquantize LL 


f_fht_hh_ug = f_fht_hh_q * q3; % unquantize HH 


ul 


f_ fht_lh_1ll_uq 
f_fht_lh_lh_ug 
£_fht_lh_hli_ug 
£_fht_lh_hh_ uq 


f£_tht.ih 11 .q@ * q2; % unquantize of LH subbands 
f_fht_lh_lh gq * q2; 
f_fht_lh_ hlig * q3; 
f_fht_lih_ hh g * q3; 


f fht_hl_llig * q2; % unquantize of HL subbands 
f_fht_hl_lhq * q3; 
f_fht_hl_hligq * q2; 
f£_fht_hl_hhgq * q3; 


f_fht_hi_il_ugq 
£_fht_hl_ ih uq 
f_fht_hl_hl_ug 
f£_fht_hi_hh ug 


EETLEETELELSTELESSEEEEEEEEEEEEEESS Inverse Transform SEEEEEEEETSESELESSETEELSEESELEEEES 


f_far_ll © invdct_of_fht (f_fht_11_uq,m_vec_ndx) ; % inv 2-D dct of LL 

f far_11 = f_far_1ll + 128; % level shift 

f.far_lh = remake_3(f_far_lh, f_fht_lh_1ll_uq, f_fht_lh_lh_ugq,f_fht_lh_hl_ugq, ... 
f_ fht_lh_hh_uq,m_vec_ndx, 8); % LH subband inv FHT 

f_far_hl = remake_3(f_far_hi, f_fht_hl_1l_uq, f_fht_hl_lh_uq,f_fht_hl_hl_ugq, ... 
£_fht_hi_hh_uq,m_vec_ndx, 8); % HL subband inv FHT 

f_far_hh = f_fht_hh_uq; 

f_far_1l]1_p2 = f_far_ll; % LL p2 assignment 

£_far_lh_ p2 = remake ~3 (f_ far 1h p2,f-fht ihuiloug; £ tnt inh Ah ue, 64 
0,0,m_vec_ndx, 8); % LH p2 subband inv FHT 

f_far_hl_p2 = remake_3(f_far_hl_p2,f_fht_hl_ll_ugq,0,... 


f_fht_hl_hl_ugq,0,m_vec_ndx, 8) ; - % HL p2 subband inv FHT 
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0; % HH p2 assignment 


f far_hh p2 = 

f far_11 pl = f_far_ll; % LL pl assignment 
f far lh pl = 0; % HH pl assignment 
f_far_hi_pl = 0; % HH pl assignment 
f_ far hh pl = 0; % HH pl assignment 


f_3 = remake_3(£_3,f_far_1l,f_far_lh,f_far_hi,... 


f_far_ hh,m_vec_ndx,16); % frame inv FHT with 3 layers 
f_2 = remake_3(f_2,f_far_ll_p2,f_far_lh_p2,f_far_hl_p2,... 
f far _hh p2,m_vec_ndx,16); % frame inv FHT with 2 layers 
f 1 = remake_3(f_1,f_far_1ll_pl,f_far_lh_pl,f_far_hl_pl,... 
f far_hh_pl,m_vec_ndx,16); % frame inv FHT with 1 layer 
else % a staic frame sequence 


TETSELEETSELLEEESEEEEEEETELEEETEESS Transforms ELLBLETELLFEEESSLSSSESLESSSESSELESESSS 


[£_fht_11,£_fht_lh, f_fht_hl,f_fht_hh ] = fht(fvec,m_vec_ndx,16); % FHT of frame 


[f£_fht_11_11, £_fht_11_1h, f_fht_ll_hl, f_fht_li_hh] autaie 
fht (£_fht_li,m_vec_ndx, 8); % subband LL FHT 


[f_fht_lh_11,£_fht_lh_ih, f_fht_lh_hl, f£_fht_1lh_hh] Sn8 
fht(£_fht_1h,m_vec_ndx, 8); % subband LH FHT 


(f£_ fht_h]_11,f fht_hl_1h, f_fht_hl_hl, f_fht_h1_hh] att 
fht (f_fht_hl,m_vec_ndx, 8); % subband HL FHT 


(£ fht_hh_11,f_fht_hh_Jh, £_fht_hh_hl, f_fht_hh_hh] ie 
fht (£_fht_hh,m_vec_ndx, 8); % subband HH FHT 


TESSETELEEEESEEEEESELEEEESESEEEEES Quantizing EEEEEEETELESLELSELESELEEEEEETELETTESE ETS 


round (f£_fht_11_11/ql); % quantize of LL subbands 
round (f_fht_l1_lh/q2); 
round (f_fht_11_h1/q2); 
round (f_fht_1ll_hh/q3); 


£.fht_li_li.g 
f_fht_ll_lh_g 
£_fht_ll_hl_q 
£_fht_11l_hh_q 


£_fht_lh_1ll_g 
f_fht_lh_lh_q 
f_fht_lh_hl_gq 
f_fht_lh_hh_q 





round (f£_fht_lh_11/q2); % quantize of LH subbands 
round (f_fht_lh_lh/q2); ; 
round (£_fht_lh_hl/q3) ; 

round (f_fht_lh_hh/q3); 


H oof fl 


round (£_fht_h1_11/q2); % quantize of HL subbands 
round (f£_fht_hil_lh/q3); 
round (£_fht_hi_hl/q2); 
round (£_fht_hi .hh/q3); 


£ fht_hl_li_g 
£_fht_hl_lhq 
f_fht_hl_hi_gq 
£ fht_hl bh gq 


round (£_fht_hh_11/q3); % quantize of HL subbands 
round (f_fht_hh_lh/q3); 
round (f_fht_hh_hi/q3); 
round (f£_fht_hh_hh/q3); 


£_fht_hh_ll_q 
f_fht_hh_ih_q 
£_fht_hh_hl_q 
f_fht_hh_hh_q 


nouow uw 


TSEELEEEEESEESESEEEEESEEEEEEETEESEEEEEEGELELELELELEEELELEELELESEEELETETESELSEEEEEETESES 
SESSTEEESEEESEESESSEESESEEETESS Working on LL SESSEEEEETEEESESEEELELEEESTEEEEELETTESS 
SETLSEESEEEELEEESEEESESESESEETEETEEELETETELEEETEEEEEEEEELELELEETETEEETEEEELELETESETEEEEES 
TELEEEELESEEEEEEEEEEEEEEEEEE DOALng the LL_LL subband SESESEEEESTSETELEESELELELETELESS 


% scans each LL_LL 4x4 (Results are one 1584x1 vector [1i_ll_r] and the index of 


% the last non-zero entity [last_11_1l_r], where the r indicates 
% horizontal raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


[last_ll_11r,il_ll_r] = raster(f£_fht_1l1_1l_q,4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_1l_11or = make_it_compact(11_ll_r,last_11l_ll_r,4); 
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% parsing LL_LL and getting the number of parsings for later used in eval_thesis to 
% ‘get the percent of time that the default bit number is used. 
parsed_1i_ll_ur_3D = parse_3D(a_li_ll_r,iast_lli_ll_r); 

1_p_11_11_r_ 3D(i+1) = length (parsed_1l1_11l_r_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 

% (includes 99 COD bits) 

{[bits_11_li_r_3D(i+1), llvilur_22(it+1)] = get_bits2 (parsed_11_11_r_3D, frame_type) ; © 
SXTSSEEETEELELETLEEESEEELEES DOing the LL_LH subband ESSSSEEEETSESEETTIEEETEESEEEEEES 


% scans each LL_LH 4x4 (Results are one 1584x1 vector [ll_lh_r] and the index of 


% _ the last non-zero entity [last_lli_lh_r], where the r indicates 
% horizontal raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


{last_1l1.l1h_r,ll_lh_r) = raster(f_fht_11_1h_q,4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_il_ih_r = make_it_compact(1ll_lh_r,last_ll_ih_r,4); 


% parsing LL_LH and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_ll_ih_r_ 3D = parse_3D(a_ll_lh_r,last_1il_lh_r); 

l_p_ll_lh_r_3D(i+1) = length(parsed_ll_lh_r_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 

% . (includes 99 COD bits) 

{[bits_ll_ih_r 3D(i+1), ll_ih_r_22(i+1)] = get_bits2 (parsed_1l_lh_r_3D, frame_type) ; 
ESELESEEEESEESEEEEESEESELEEEE DOUIng the LL_HL subband SSEEESESESEEESSSTSELESEESEEESEEEEES 


% scans each LL_HL 4x4 (Results are 1584x1 vector [1ll_hl_r] and the index of 


% the last non-zero entity [last_li_hl_r], where the r indicates 
% horizontal raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


{last_ll_hl_r,ll_hl_r] = raster(f£_fht_1ll_hi_q, 4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_ll_hlur = make_it_compact(ll_hl_r,last_ll_hl_r,4); 


% parsing LL_HL and getting the number of parsings for later used in éeval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_li_hl_r_3D = parse_3D(a_ll_hl_r,last_ll_hl_r) ; 

l_p_1i_hi_r 3D(i+1) = length(parsed_ll_hl_r_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 
% (includes 99 COD bits) — 
{bits_ll_hl_r 3D(i+1), ll_hl_r_22(i+1)] = get_bits2 (parsed_1l_hl_r_ 3D, frame_type) ; 


SESESSESEEEEEESEEEEEEEEEEESEESE DOing the LL_HH subband SFEESEESESESEEESSESESEELEEEEEELEES 
% scans each LL_HH 4x4 (Results are 1584xi vector [ll_hh_r] and the index of 

% the last non-zero entity [last_ll_hh_r], where the r indicates 

% horizontal raster scan method. Get "inf" if a group of 16 is all 

% zeros.) 


(last_1]1_hh_r,ll_hh_r] = raster(f_fht_11_hhq,4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_ll_hh r = make_it_compact (1l_hh_r,last_1li_hh r,4); 


% parsing LL_HH and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_1l_ hh r_3D = parse_3D(a_1ll_hh_r,last_1l_hh_r); 

1_p_li_hh_r_3D(i+1) = length(parsed_ll_hh r_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 
% (includes 99 COD bits) 


[bits_1]l_hh_r_3D(i+1), ll_hh_r_22(i+1)] = get_bits2(parsed_ll_hh_r 3D, frame_type); 
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f 


TLEEEEEESEEEEEEEEESSSSEEEEEEEEEESEESSESSEEELESEELEEETEEEEEEEETEEEEEEEEETETTEEEEEESEEEE 
EESESEEESEEESESESSSSEESSESEEEE Working on LH subbands SESESEEEEEETETESSTTTEEEEEESTTEES 
EESEESETESSELBESSESEEELESEETESSEEESSEEEYEEEEEEESESEEEEEETEEELEEESTETEEEEEEEEEETESTEEES 
EZEEEEESEESSEESSSESSSESSEBESF Doing the LH_LL subband SELSEEEEEEESREEEEEETEEESEEETETEE 


% scans each LH_LL 4x4 (Results are one 1584x1 vector [lh_il_r] and the index of 


% the last non-zero entity [last_lh_ll_r], where the r indicates 
% horizontal raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


f{last_lh_ll_r,lh_ll_r] = raster(f_fht_lh_lli_q,4); 


% gets rid of trailing zeros (one big vector of varying size) 
a lh ll_r = make_it_compact(lh_ll_r,last_lh_ll_r,4); 


% parsing LH_LL and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 


parsed_lh_1ll_r_3D 


= parse_3D(a_lh_11_r,last_lh_ll_r); 
lp lh llr 3D(i+1) = 


length (parsed_lh_1ll_r_3D(:,1)); 

% gets bits per frame and number of times the default is chosen 

% (includes 99 COD bits) 

[bits lh 11 r_3D(it+1), Lh_ll_r_22(i+l)] = get_bits2 (parsed_lh_1ll_r_3D, frame_type) ; 
ESESESEEESSEEEEEEEEEEEEEEESEES DOIing the LH_LH subband TESSEEEELSSEESESEESSSSSSSSSSSSE 


% scans each LH_LH 4x4 (Results are one 1584x1 vector [lh_lh_r] and the index of 


% the last non-zero entity [last_lh_lh_r], where the r indicates 
% horizontal raster scan method. Get “inf" if a group of 16 is all 
% | zeros.) 


[last_lh_ lh_r,ih_lh_r] = raster(f_fht_lh_lh_q,4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_lh lhor = make_it_compact(lh_lh_r,last_lh_lh_r,4); 


% parsing LH_LH and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_lh_lh_r_3D = parse_3D(a_lh_lh_r,last_ih_lh_r); 

1_p_lh lh ry 3D(i+1) = length(parsed_lh_lh_r_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 

% (includes 99 COD bits) 

[bits_lh_lh_ ry 3D(i+1), JTh_lh_r_22(1i+1)] = get_bits2 (parsed _lh_lh_r_3D, frame_type) ; 
S<SETESESTESSEESEESESEEESES DOing the LH_HL subband FEESEESTEESEEEESESEEETETTTEETETETS 


% scans each LH_HL 4x4 (Results are 1584x1 vector [lh_hl_r] and the index of 


% the last non-zero entity [last_lh_hl_r], where the r indicates _ 
% horizontal raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


[last_lh_hi_r,lh_hl_r] = raster (f_fht_lh_hl_q, 4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_lh hl_r = make_it_compact(lh_hl_r,last_lh_hl_r,4); 


& parsing LH_HL and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_lh_hl_r_3D = parse_3D(a_lh_hl_r,last_lh_hl_r); 

1p lh hi_r_3D(i+1) = length(parsed_lh_hl_r_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 

% (includes 99 COD bits) 

[bits _lh hl_r_ 3D(i+1), lh_hl_r_22(i+1)] = get_bits2 (parsed_lh_hl_r_3D, frame_type) ; 
TEELSSSETESEEEEEEEEEEEEEEESEE DOing the LH_HH subband EEEESELEEEEEEEEEEEESETETEEE TESS 


% scans each LH_HH 4x4 (Results are 1584x1 vector [lh_hh_r] and the index of 


% the last non-zero entity [last_lh_hh_r], where the r indicates 
% horizontal raster scan method. Get "inf" if a group of 16 is all 
8] 








% zeros.) 
[last_ih_hh_r,lh_hh_r] = raster(f_fht_lh_hh_q, 4); 


% gets rid of trailing zeros (one big vector of varying size) 
a _lh_ hh r = make_it_compact (lh_hh_r,last_lh_hh_r,4); 


% parsing LH_HH and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_ih_hh_ r_3D = parse_3D(a_lh_hh_r,last_lh_hh_r); 

1p_lih_hh_r_3D(i+1) = length(parsed_lh_hh_ r_ 3D(:,1)); 


% gets bits per frame and number of times the default is chosen 


% (includes 99 COD bits) 
[bits_ih hh r 3D(i+t1), lh hh r 22(it+1)] = get_bits2 (parsed_lh_hh_r_3D, frame_type) ; 


ELEETLEETTESTEETEEEELETESTELELTEEEEEELEEEEEEEEEEEETEEEETEELEEELEELEELEEESESEEEESELSES 
EESEEEESEEETEEETETEEEESEESESESS Working on HL subbands SEEEESESELSEETESELSESSSESSEESEES 
SEELEEEETEESTTEELELELEESELETETTESEEELESELESEEETEEEEEEELSESESLEEEEEYSEETEEEEEEEEESEEEESES 
EESEEEETEEETEETEEEEESEEESESE DOing the HL_LL subband S¥ESSEESESEEESEEEEEELSESESSETEEES 


% scans each HL_LL 4x4 (Results are a 1584x1 vector [hl_ll_v] and the index of 


% the last non-zero entity [last_hl_1l1l_v], where the v indicates 
% vertical raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


{last_hl_li_v,hl_llv] = vertical (f_fht_hl_11_q,4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_hl_lliuv = make_it_compact (h1_11]1_v,iast_hl_11_v,4); 


% parsing HL_LL and getting the number of parsings for later used in eval_‘thesis to 
% get the percent of time that the default bit number is used. 
parsed_hi_1ll_v_3D = parse_3D(a_hl_11l_v,last_hl1_11_v); 

l_p_hl_li_v_3D(i+1) = length(parsed_hl_1lli_v_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 

& {includes 99 COD bits) 

[bits_hl_1l1_v_3D(i+1), hl_ll_v_22(i+1)] = get_bits2 (parsed_hl_11_v_3D, frame_type) ; 
EETESTEEEEESESSEEESELELETESEES Doing the HL_LH subband SEESESEEFEESESESSESEESESEESEEES 


% scans each HL_LH 4x4 (Results are a 1584x1 vector [hl_lh_v] and the index of 


% the last non-zero entity [{last_hl_lh_v], where the v indicates 
% vertical raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


[last_hl_lh_v,hl_lhv] = vertical (f_fht_hl_lh_g,4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_hl_ih_v = make_it_compact (hl_lh_v,last_hi_lh wv,4); 


% parsing HL_LH and getting the number of parsings for later used in eval_thesis to 
% | get the percent of time that the default bit number is used. 
parsed_hl_lh_v_3D = parse_3D(a_hl_lh _ v,last_hl_lih_v); 

l1pwhi_lh_v_3D(i+l) = length(parsed_hl_ih_v_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 

% (includes 99 COD bits) . 
[bits _hl_ lh v_3D(i+tl), hl_lhov_22(i+1)] = get_bits2 (parsed_hl_lh_v_3D, frame_type) ; 
SELEEEETETEEEETESEEESEESEEEEEES DOing the HL_HL subband S$ESFESEESSEESETSEEESREEEESFEEESEESS 


% scans each HL_HL 4x4 (Results are a 1584x1 vector [hl_hl_v] and the index of 


% the last non-zero entity [last_hl_hl_v], where the v indicates 
% vertical raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


[last_hl_hl_v,hi_hl_v] = vertical(f_fht_hl_hlig,4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_hli_hl_v = make_it_compact (hl_hl_v,last_hl_hl_v,4); 
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%* parsing HL_HL and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_hl_hl_v_3D = parse_3D(a_hl_hl_v,last_hl_hl_v); 

1_p_hl_hi_v_3D(iti) = length(parsed_hl_hl_v_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 

% (includes 99 COD bits) 

[bits_hi_hl_v_3D(i+1), hi_hl_v_22(i+1)] = get_bits2 (parsed_hl_hl_v_3D, frame_type) ; 
SEESEESTEETEESESEEEEEEETESESESE DOing the HL_HH subband SEESEESESSEEEETESESETSLSEESEELEEESS 


% scans each HL_HL 4x4 (Results are a 1584x1 vector [hl_hh_v] and the index of 


% the last non-zero entity [last_hl_hh_v], where the v indicates 
% vertical raster scan method. Get "inf" if a group of 16 is all 
-% zeros.) 


{last_hl_hh_ v,hl_hh_v] = vertical (f_fht_hl_hh_q,4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_hl_ hh v = make_it_compact (hl_hh_v,last_hl_ hh v,4); 


. % parsing HL_HH and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_hi_hh_ v_3D = parse_3D(a_hl_hh_v,last_hli_hh_v); 

1_p_hl_hh_v_3D(i+i) = length(parsed_hl_hh_v_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 

% (includes 99 COD bits) 

[bits_hi_hh_ v_3D(i+1), hl_hh_v_22(i+1)] = get_bits2 (parsed_hl_hh_v_3D, frame_type} ; 
SESETETSEEESEEEEEETESEEETETESTLETELELSEESLESESESESEESEEEEEESEEELEEEEEE ELE SEEEETEESESEEEEEEES 
ESTSSELEESSESESEETEEEEETEEEEEEE Working on HH subbands SEEEEEFEESESESELETELETESESLETESEEES 
SESSEEEEEEEEETETETEEEEETELEEEEEEEEEEEEEEEEEEEEELETEEETEEEETEELE SELES EEE ES ESELESEEESEES 
TESESETEEESELSETTELETEEESEESES DOing the HH_LL subband ESEEEEESESEESETTLESTEELELELESESS 


% scans each HH_LL 4x4 (Results are one 1584x1 vector [hh_ll_r] and the index of 


% the last non-zero entity [last_hh_ll_r], where the r indicates 
an horizontal raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


[last_hh_ll_r,hh_ll_r] = raster(f_fht_hh_1ll_q,4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_hh_ll_or = make_it_compact (hh_11_r,last_hh_1ll_r,4); 


% parsing HH_LL and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_hh_1l_r_3D = parse_3D(a_hh_ll_r,last_hh_ll_r); 

lp hhll_r 3D(i+1) = length(parsed_hh_11]_r_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 

% (includes 99 COD bits) 

{bits_hh_li_ry_3D(i+1), hh_1ll_r_22(i+1)] = get_bits2 (parsed_hh_1ll_r_3D, frame_type) ; 
TESSESEELETLESEETEEEELELESEEES DOing the HH_LH subband FESEESESESESTSETSSTESELESELE SE SESS 


% scans each HH_LH 4x4 (Results are one 1584x1 vector [hh lh r] and the index of 


% the last non-zero entity [last_hh_lh_r], where the r indicates 
% horizontal raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


[last_hh_lh_y,hh lh r] = raster(f£_fht_hh lh g,4);_ 


% gets rid of trailing zeros (one big vector of varying size) 
a_hh_ lh r = make_it_compact (hh_lh_r,last_hh_lh_r,4); 


% parsing HH_LH and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_hh_lh r_3D = parse_3D(a_hh_ih_r,last_hh_lh_r); 

l1_p_hh lh r_ 3D(i+1i) = length(parsed_hh_lh_r 3D(:,1)); 


% gets bits per frame and number of times the default is chosen 
% (includes 99 COD bits) 
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[bits_hh lh_r_3D(i+1), hh_lh_ry_22(i+1)] = get_bits2 (parsed_hh_lh_r_3D, frame_type) ; 
SEEEEEEELSEEEEEEEESESEEEEESES DOing the HH_HL subband FEEEEEEEESESESESELEETEETETETETSES 


% scans each HH HL 4x4 (Results are 1584x1 vector {hh_hl_r] and the index of 


% the last non-zero entity [last _hh_hl_r], where the r indicates 
% horizontal raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


[last_hh_hl_r,hh_hi_r] = raster(f£_fht_hh_hi_q,4); 


% gets rid of trailing zeros (one ‘big vector of varying size) 
a hh hi_r = make_it_compact (hh_hl_r,last_hh_hl_r,4); 


% parsing HH_HL and getting the number of parsings for later used in eval_thesis to 
% get the percent of time that the default bit number is used. 
parsed_hh_hl_r_3D = parse_3D(a_hh_hl_r,last_hh_hl_r); 

lp hh hl_r 3D(i+1) = length (parsed_hh_hl_r_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 

% (includes 99 COD bits) 
[bits_hh_hl_r_ 3D(i+l), hh_hl_r_22 (i+1)] = get_bits2 (parsed_hh_hi_r_3D, frame_type) ; 
FELLEEEEEEEEESEEEEEESEEEEEESE DOing the HH_HH subband SELEEEEEEVEEEEEEEESEETETELEESEES 


% scans each HH_HH 4x4 (Results are 1584x1 vector [hh_hh_r} and the index of 


% the last non-zero entity [last_hh_hh_r], where the r indicates 
$ horizontal raster scan method. Get "inf" if a group of 16 is all 
% zeros.) 


[last _hh hh r,hh _hh_r] = raster(f fht_hh_hh_q, 4); 


% gets rid of trailing zeros (one big vector of varying size) 
a_hh bh r = make_it_ compact (hh_hh_r,last_hh_hh_r,4); 


% parsing HH_HH and getting the number of parsings for later used in eval_thesis to: 
% get the percent of time that the default bit number is used. 


parsed_hh_hh_r_3D = parse_3D(a_hh_hh_r,last_hh_hh_r); 
lp hh bh yr 3D(it1) = length(parsed_hh_hh_r_3D(:,1)); 


% gets bits per frame and number of times the default is chosen 
% (includes 99 COD bits) . 
(bits_hh hh ry 3D(it+1), hh hh_r_22(itl)] = get_bits2 (parsed_hh_hh_r_ 3D, frame_type) ; 


% getting bits per frame per layer and total 
layeri(i+1) = bits_11_ll_r_3D(i+1) + bits_ll_lh_r_3D(i+1) + ... 
' bits _lh_ 11] r_3D(it+t1)+ bits_lh_lh_y_3D(it+l) + ... 
bits 11 hl_r 3D(i+1)+ bits_hi_ll_v_3D(i+l) + ... 
bits_hh_11_r_3D(it+i)+ bits_hl_hi_v_3D(i+l) + ... 
bits_hh_ hh r_3D(i+1); 
layer2(i+1) = bits_1l_hh_r_3D(i+1) + bits_lh_hl_r_3D(i+tl) + ... 
bits_ih_ hh yr 3D(i+1) + bits_hli_lh_v_3D(i+l) +... 
bits_hi_hh v_3D(i+l); 
layer3(i+1) = bits _hh_lh_r_3D(i+1) + bits hh_hl_r_3D(i+l); 
total(i+i) = layerl(i+1) + layer2(i+1) + layer3(i+1); 


SELLLELELEESLTSERESLETESSSSESTEEESESTSSESSSTETESLETETELESTLLESLESESLESSESESESESESES 


% % 
% Channel % 
% % 


TESEESEEEEESSEESEEEEEEEEEEESELESEEEEELEEEEESEEEEEEEEETEEEEEEEEEELEEETEEEESEEEETEEEEESES 
SELSEESESEEESEEEEEELEEESEELEEEESESESEESE Unquantize BESEEESELEEEEEELEELTEEEEEEELELEELESESS 


f_fht_11_11_uq 
£ fht_l11_lh_ug 
f_fht_1l1_hl_ug 
£ fht_ll_hh_ugq 


f fht-1) iL :¢ * qi; % unquantize of LH subbands 
f_fht_ll lhoq * q2; 
£fhe..13 hl-.g:*-q2; 
£ fht_ll_hhq * q3; 


f£ fht_ih_ll_uq 
f fht_lh_ lh ugq 
£ fht_lh_hi_udq 


f fht_lh_ lig * q2; ; % unquantize of LH subbands 
f fht_lh lh gq * q2; 
f_fht_lh hlig * q3; 
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f_fht_lh_hh_uq 


£ fht_hl_ll_uq 
f fht_hl_lh_uq 
£ fht_hi_hl_uq 
£_fht_hl_hh_ uq 


£_fht_hh_ 11_ugq 
f£ fht_hh ih_uq 
£_fht_hh_hl_ug 
f£_fht_hh hh ud 


f_fht_lh_hh_g 


f_fht_hl_ll_gq 
f_fht_hi_ih_gq 
f_fht_hli_hl_ig 
f£_fht_hl_hh_q 


f_ fht_hh_ li_gq 
f_fht_hh_ lhgq 
f fht_hh_hi_a¢ 
f_ fht hh hhg¢ 


q3; 








q2; % unquantize of HL subbands 


q3; 
q2; 
q3; 


q3; % unquantize of HH subbands 


q3; 
q3; 
q3; 


SEEEETEEEEEESEEEESEEEEEEEEEEEEE Inverse Transform SES$EEEESTESESESEESEEEEEETEETEETEEES 


f far_ll = remake_3(f_far_ll, f_fht_11_11_uq,f_fht_ll_lh_ugq,f_fht_ll_hl_uq, ... 
% LL subband inv FHT 
f£_far_lh = remake_3(f_far_lh,f_fht_lh_ll_ugq, f_fht_lh_ih_ugq,f_fht_lh_hl.uq, ... 
% LH subband inv FHT 

f far_hl = remake_3(f_far_hl, f_fht_h1l_ll_ugq, f_fht_hi_lh_uq, f_fht_hli_hl_uq, 

% HL subband inv FHT 
f_far_hh = remake_3(f_far_hh, f_fht_hh_1ll_uq, f_fht_hh_lh_ug,f_fht_hh_hl_uq, ... 
% HH subband inv FHT 


f fht_l1_hh uq,m_vec_ndx, 8); 
£ fht_lh_ hh ugq,m_vec_ndx, 8); 
f_ fht_hl_ hh _ uq,m_vec_ndx, 8) ; 
f_fht_hh_hh_uq,m_vec_ndx, 8); 


f_ far_1] p2 = remake_3(f_far_ll_p2,f_fht_il_ll_ugq, f_fht_ll_lh_ugq,... 
f fht_11]_hi_uq, f_fht_ll_hh_uq,m_vec_ndx, 8); 


f_far_lh_p2 = remake_3(f_far_lh_p2,f_fht_ih_ll_uq, f_fht_ih_lh_uq,.. 


f_ fht_lh_hl_uq, f_fht_ih_hh_uq,m_vec_ndx, 8) ; 


f far_hi_p2 = remake_3(f_far_hl_p2,f_fht_hl_ll_uq, f_fht_hl_th_uq,. 
f_fht_hi_hi_ugq, £_fht_hl_hh _uq,m_vec_ndx, 8); % HL p2 
f_far_hh p2 = remake_3(f£_far_hh p2,f_fht_hh_ll_ugq,0,0,... 


f_ fht_hh_ hh_uq,m_vec_ndx, 8); 


f£ far_ll_pl = remake_3(f_far_ll_pl,f_fht_il_ll_ugq, f_fht_ll_lh_ugq,... 
f fht_1li_hl_ugq,0,m_vec_ndx, 8) ; 
f_far_lh pl = remake_3(f_far_lh_pl,f_fht_ih_ll_ug, f_fht_ih_lh_ugq,... 


0,0,m_vec_ndx, 8); 
f far_hl_pl = remake_3(f_far_hl_p1,f_fht_hi_ll_uq,0,... 


f£ fht_hl_hi_ugq,0,m_vec_ndx, 8) ; 


% HH p2 


% LL pl 
% LH pl 


% HL pl 


f far_hh pl = remake_3(f_far_hh_pl,f_fht_hh_ll_ugq,0,... 


0,£_fht_hh_hh_uq,m_vec_ndx, 8) ; 


% HH pl 


f_3 = remake_3(f_3,f_far_1ll,f_far_lh,f_far_hl,... 
f_ far_hh,m_vec_ndx,16); | 
£2 = remake_3(f_2,f_far_ll_p2,f_far_lh_p2,f_far_hl_p2,... 
f far_hh p2,m_vec_ndx,16); 
f 1 = remake_3(£_1,f_far_1l1_pl1,f_far_lh_pl,f_far_hl_pi,... 
f far_hh_ pl1,m_vec_ndx,16); 


end % ends frame_type descrimination 


% frame 


% frame 


% frame 


subband 


subband 


subband 
subband 
subband 


subband 


inv FHT 


inv FHT 


inv FHT 


% LL p2 subband inv FHT 


% LH p2 subband inv FHT 


inv FHT 


inv FHT 


inv FHT 
inv FHT 
inv FHT 


inv FHT 


with 3 layers 
with 2 layers 


with 1 layer 


ESSSEEEESEEEEEESEEEEEESELEEEESEEE Make Display Size SESEEEEESESTESSELETESESEESETETELEEEEY 


fig_3 = shape _back(f_3); 
fig_2 = shape_back(f_2); 
fig_1 = shape_back(f_1); 
L3 round (fig_3); 


L2 = round(fig_2) 
= round(fig_i) 


SEEEETETEETESELESEEESEES Calculates the mean-square-error 


if mse 
MSE_3L(i+1,1) 
MSE_2L(i+1,1) 
MSE_1L(i+1,1) 

end % end if mse 


° 
’ 


2 
, 


— 
— 
— 
— 
— 
Cs 


% make display dimensions 
% make display dimensions 
% make display dimensions 


% for viewing 
% for viewing 
% for viewing 


(sum(sum((i present - fig_3).*2)))/176/144; 
(sum(sum((i_present - fig_2).*2)))/176/144; 
(sum(sum((i_present - fig_1).%2)))/176/144; 
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per frame BEEEEEEEETEESSEESES 





if display 


% The following lines show the figure with 3 layers 
subplot (2,2,2) 

image (L3) 

axis off 

title(’Layers 1, 2, and 3’) 


% The following lines show the figure with 2 layers 
subplot (2,2,3) 

image (L2) 

title(’Layers 1 and 2’) 

axis off 


% The following lines show the figure with 1 layer 
subplot (2,2,4) 

image (L1) 

title(‘Layer 1 Only’) 

axis off 

drawnow 


end % ends if display 


% closing reconstruction occasionally to conserve memory 
if ((i==20) | (i==40) | (i==60) | (i==80) | (i==100) | (i==120) | (i==140) | (i==160) | (iz=180)]... 
(i==200) | (i==220) | (i==240) | (i==260) | (i==280)) 
close all 
end 
end % ends looping through frames 


% The following lines simply rename variables to be consistent with an off-line 
% evaluation program used in the development. 


% LL dynamic 
BITS _ LL HUFF = bits_1l1_2z’; 


% LL static 
BITS_LL_ LL 3D 
SCAN_LL_ LL_22 


bits_11_11 _r_ 3D’; 
11_11 r_ 22’; 








SCAN_LL_ LL LEN = 


BITS_LL_LH_ 3D = bits_l1l_lh_r_3D’; 
SCAN_LL_LH_22 = 11_lh_r_22’; | 
SCAN_LL_LH_LEN 1p 11 ihe y 23D" 
BITS_LL_HL 3D = bits_ll_hl_r_3D’; 
SCAN_LL_HL_22 = li _hl_r_22’; 


: 
: 
: 
aS 


if 
BE 
Bi i 


| 
| 
| 


NO 
H Mf 


+ 
E 
: 
tt 
p 


% LH both types of slide 
BITS_LH_ LL 3D = bits_lh_ll_r_3D’; 
SCAN_LH BLL 22 = lh_ll_y_22’; 
SCAN_LH LL LEN = lip lh llr 3p’; 
BITS_LH_ LH 3D = bits _lh_lh_r_3D’; 
SCAN_LH_LH_ 22 = lh_lh_r_22'; 


SCAN_LH_LH_ LEN = 


BITS_LH_HL_3D 
SCAN_LH_ HL_22 


SCAN_LH_HL_LEN = 


BITS_LH_HH_ 3D 
SCAN_LH_HH_22 


TL 


SCAN_LH_HH_ LEN = 


% HL both types 


uo se sn GS ge 3 


= lop liihiir 3D": 


bits_li_hh r_ 3D’; 


Lic bh-r 22"; 


l_p_11_hh r_ 3D’; 


lLpwilh ih r 3D’; 


bits_lh_hl_r_3D’; 


ih_hl_r_22’; 


lop_lhhi_r_3D’; 


bits_lh_ hh r 3D’; 


lh_hh_r_ 22’; 


l_pwih_hh r_ 3D‘; 


of slides 
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bits_hl_1ll_v_3D’‘; 
hl_1ll_v_22’; 
SCAN_HL_LL_LEN = 1_p_hl_ll_v_3D’; 


BITS_HL_LL_3D 
SCAN_HL_LL_22 


BITS_HL_ LH 3D = bits_hl_lh_v_3D’; 
SCAN_HL_LH_22 hil_ih_v_22'; 
SCAN_HL_LH_LEN = 1_p_hl_lh_v_3D’; 


bits_hl_hl_v_3D’; 
hl_hi_v_22’; 
SCAN_HL_HL_LEN = 1_p_hl_hl_v_3D’; 


No 
li fl 


bits_hl_hh_v_3D’; 
e hl_hh_v_22’; 
SCAN_HL HH LEN = l_p_hl_hh_v_3D’; 


wo i 


% HH from dynamic 
BITS_HH 3D = bits_hh_r_3D'; 
SCAN_HH_22 = hh_y_22’; 
SCAN_HH LEN = l_p_hh_r_3D’; 


% HH from static 

BITS_HH LL_3D = bits_hh_1li_r_3D’; 
SCAN_HH_ LL 22 = hh_11]_r_22’; 
SCAN_HH LL LEN = l_p_hh_il_r_3D’; 


D ts _hh lh r_ 3D’; 
2 _lh r 22’; 
SCAN_HH_ LH LEN = l_p_hh lh_r 3D’; 


EE 


= bits_hh_hl_r_3D’; 
= hh_hi_r 22’; 
SCAN HH _HL LEN = l_p uhh hl_r_ 3D’; 


__HH_2 3 
SCAN_HH_HH_LEN = 1_p_hh hh r_ 3D’; 


% saves parameters for evaluation later 
if write 

s = char(’g:\thesis\BR’); 

Ss = streat(s,num2str(br)); 

save(s, ‘m_mat_ndx’,’BITS_*’, ‘MSE_*’, ‘SCAN_*’, ‘layer*’) 
end % if write 





function [output] = dct_of_fht (input,m_vec_ndx) 


% performs the 2D DCT of input. m_vec_ndx identifies where this operation needs 
% to be performed. 


output = zeros (792,8); 


offset -8; 
for ndx = 1:99 
offset = offset + 8; 


if m_vec_ndx(ndx) 
output (offset+1l:offset+8,:) = det2 (input (offset+l:offset+8,:)); 
end 


end 
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function [{f_fht_11,f fht_ih, f_fht_h1,f_fht_hh] = fht(f_vec,m_vec_ndx, in) 


% Performs the FHT of the appropriate IN x IN macroblocks of F_VEC as 

% specified by M_VEC_NDX. Returns four sets of 99 IN/2 x IN/2 matrices, 

% each stacked into one big 99*IN/2 x IN/2 matrix. The places 

% where the fht was not performed are filled with zeroes as place-holders. 


half = in/2; 
f_fht_1ll zeros (99*half, half) ; 


f fht_lh = zeros(99*half,half); 
£_ fht_hl = zeros(99*half,half); 
f fht_hh = zeros(99*half,half); 
mask_lh = [1 1;-1 -1]; 

mask_hl = {1 -1;1 -1]; 

mask_hh = {1 -1;-1 1]; 

offset = -in; 


for ndx = 1:99 
offset= offset+in; 
oso2 = offset/2; 
if m_vec_ndx (ndx) 
for rndx = i:half 
for cndx = i:half 
£_fht_1ll1(rndx+oso2,cndx) = ... . 
sum(sum(f_vec (rndx*2-1+offset:rndx*2+offset, cndx*2-1:cendx*2)))/4; 
£_fht_1lh(rndx+oso2,cndx) = sum(sum(mask_lh .* ... 
f_vec (rndx*2-l1+offset:rndx*2+offset, cndx*2-1:cndx*2)))/4; 
£_fht_hl (rndx+oso2,cndx) = sum(sum(mask_hl .* ... 
f_ vec (rndx*2-l+offset:rndx*2+offset, cndx*2-1:cndx*2)))/4; 
£_fht_hh (rndx+oso2,cndx) = sum(sum(mask_hh .* ... 
f vec (rndx*2-1+offset:rndx*2+o0ffset, cndx*2-1:cndx*2)))/4; 
end 
end 
end 


end 


function [bits] = get_bits_Huff (parsed) 
% Uses Huffman table to get bits for JPEG-based compression of PARSED. 
global HUFF_TABLE 


bits = 0; 
[r,c] = size(parsed); 





for nadx = i:r 


while (parsed(ndx,1) >= 15) 

bits = bits + 11; 

parsed(ndx,1) = parsed(ndx,1) - 15; 
end 


table_row = find((HUFF_TABLE(:,2) == parsed(ndx,1)) & ... 
(HUFF_TABLE(:,3) == parsed(ndx,2))); 
bits = bits + HUFF_TABLE(table_row,4) + parsed(ndx,2); 
end 











bits = bits + 99*4; 


function [bits,count] = get_bits2 (parsed, kind) 


§ Uses PARSED to fetch bits from custom VLC tables 
% KIND will be 1 for static scenes 
% KIND will be 0 for dynamic scenec 


global VLC_DYN VLC_STA 


bits = 0; 
count = 0; 
parsed = abs (parsed); 
{r,c] = size(parsed) ; 


if kind 
for ndx = 1:r 
poss_rows = find(VLC_STA(:,2) == parsed(ndx,1)); 
start = poss_rows (1); 
poss_rows = find(VLC_STA( (poss_rows (1) :poss_rows (length (poss_rows))),3) ==... 
parsed (ndx,2)); 


if ~(isempty (poss_rows) ) 
poss_rows = poss_rows + start - 1; 
start = poss_rows(1); 

end 


if ~(isempty (poss_rows) ) 
poss_rows = find(VLC_STA((poss_rows (1) :poss_rows (length (poss_rows) )),4) 
parsed (ndx,3))}; 
poss_rows = poss_rows + start - 1; 
end 


if ~(isempty (poss_rows) ) 

bits = bits + VLC_STA(poss_rows,5); 
else 

bits = bits + 22; 

count = count + 1; 
end 





end 
else 


for ndx = 1:r 
poss_rows = find(VLC_DYN(:,2) == parsed(ndx,1)); 
start = poss_rows(1); 
poss_rows = find(VLC_DYN( (poss_rows (1) :poss_rows (length (poss_rows))),3) ==... 
parsed (ndx,2)); . 


if ~(isempty (poss _rows) ) 
poss_rows = poss_rows + start - 1; 
start = poss_rows (1); 

end 


if ~(isempty (poss_rows) ) . 
_ poss_rows = find(VLC_DYN( (poss_rows (1) :poss_rows (length (poss_rows))),4) ==... 
parsed (ndx,3)); 
poss_rows = poss_rows + start - 1; 
_ end 


if ~(isempty (poss_rows) ) 

bits = bits + VLC_DYN(poss_rows,5); 
else ' 

bits = bits + 22; 

count = count + 1; 
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end 
end 


end 
bits = bits + 99; 


function [f£] = get_next_image (num) 


% Gets next frame as a .bmp file converts to form needed in MATLAB 


char (‘g:\pictures\Head’ ); 
n = strceat(s,num2str (num) ); 
= imread(fn,’bmp’); 
= double(£(:,:,1)); 


function [f] = get_next_image_2 (num) 
% Gets next frame as a .bmp file converts to form needed in MATLAB 


Ss = char(’g:\pictures\ncaa’); 
fn = strcat(s,num2str (num) ); 

f imread (fn, ‘bmp’); 
f double(£(:,:,1)); 


fl 


function [Q] = get_Ql_matrix(ql) 


% makes the JPEG standard quantization matrix and multiplies it by ql. 
% ql = 16 will result in no scaling of the matrix when coupled with the rest | 
% of the code. ql < 16 is finer quantization, i.e. less quantization noise results. 


Q = {16 11 10 16 24 40 51 61; 
12 12 14 19 26 58 60 55; 
14 13 16 24 40 57 69 56; 
14 17 22 29 51 87 80 62; 
18 22 37 56 68 109 103 77; 
24 35 55 64 81 104 113 92; 
49 64 78 87 103 121 120 101; 
72 92 95 98 112 100 103 99) .* qi; 


function [out] = get_qd_entryf(flag,default, entry, delta_Binter,MBnum, beta) 


selects the appropriate quantizer triplet based on the input parameters 

FLAG implies the first frame following a scene change frame. DEFAULT is the 
choice of triplet based on test sequences and serves as a starting point for a 
new sequence. ENTRY hold the table entry from the previous frame. The remaining 
parameters ara as defined in the thesis. 


dP dP dP dP dP 


global QD_TABLE 


if (flag) 
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out = min(find(default > QD_TABLE(:,4))); 
else 

deltaQ = fix(delta_Binter/MBnum/beta) ; 

out = entry + deltaQ; 

if (out > 17) 


out = 17; 
end 
if (out < 1) 
out = l; 
end 


end 


function [run] = get_run(seq_to_code, len) 


% Called by the parsing functions PARSE_3D and PARSE_HUFF, this function obtains 
% the RUN field for RLE. 


run = zeros(len,1); 
count 


place 
mdx = 


=e Se 


0 
1 


Bn i 


. 
’ 


while (mdx <= length(seq_to_code) ) 
if seq_to_code (mdx) 
place = place + 1; 
max = mdx + 1; 
else 
while (seq_to_code(mdx) == 0) 
mdadx = mdx + 1; 
count = count + 1; 
end 
run(place) = count; 
count = 0; 
place = place + 1; | 
mdx = mdx + i; : 
end 
end 


function [output] = invdct_of_fht (input,m_vec_ndx) 


% Performs the inverse 2D DCT. M_VEC_NDX identifies where this operation needs to be 
% performed. : 


output = zeros (792,8); 
offset = -8; 


for ndx = 1:99 
t = offset + 8; 


if m_vec_ndx(ndx) 
output (offset+1:offset+8,:) = idct2 (input (offset+l:offset+8,:)); , 
end 


end 
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function [m_vec_ndx, byAge, b_count, mb_count] = m_blk_id_xr(f1,£2,T) 


Figures out which 16x16 macroblocks of the 144x176 image need to be further 

processed via an absolute sum of differences between the last frame f1 and 

the current frame £2. m_vec_ndx is 99xl vector of 0 or 1 raster scanned. 

T is the threshold utilized. byAge is 1 if macroblocks are selected only due 

to aging. byAge is 0 otherwise. b_count and mb_count were used in refining the 

search sequence. b_count is "block count." mb_count is "macroblock count." As : 
are artifacts now, they are commented-out. 


GP dP dP dP dP dP dP 


global LAST RV . 


$b_count = Q; 

mb_count = 0; 

r_count = 0; 

m vec_ndx = zeros(99,1); 


for ndx = 1:99 
if (RV(ndx) == 0) 
m_vec_ndx(ndx) = 1; 
RV(ndx) = floor(21*rand) ; 
else 
RV(ndx) = RV(ndx) - 1; 
end 
end 


byAgel = sum(m_vec_ndx) ; 


for rndx = 1:16:1584 
r_count = r_count + 1; 


if (~m_vec_ndx(r_count) ) 


go = 1; . 
if ((LAST(r_count) == 1) & (go)) 
for cndx = 1 


£1_8x8 = £1 (rndx:rndx+7,cndx:cndx+7); 
£2_8x8 = £2 (rndx:rndx+7,cndx:cndx+7) ; 
asd = abs (sum(sum(£1_8x8-£2_8x8))); 


if (asd > T) 
m_vec_ndx(r count) = 1; % ID’s position 
$mb_count = mb_count + 1; 
$b_count = b_count + 1; 

LAST (r_count) = 1; 


go = 0; 
break; % gets out of the inner loop if justified now 
end 


£1_8x8 = £1 (rndx+8:rndx+15, cndx+8 : cndx+15); 
£2_8x8 = £2 (rndx+8:rndx+15,cndx+8 :cndx+15); 
asd = abs (sum(sum(f£1_8x8-£2_8x8))); - 


if (asd > T) 

m_vec_ndx(r_count) = 1; % ID’s position 

mb_count = mb_count + 1; 

$b_count = b_count + 2; 

LAST (r_count) = 3; 

go = 0; 

break; % gets out of the inner loop if justified now 
end 


£1_8x8 = £1 (rndx:rndx+7,cndx+8:cndx+15) ; 
£2_8x8 = £2 (rndx:rndx+7, cndx+8:cndx+15); 
asd = abs(sum(sum(£1_8x8-£2_8x8))); 


if (asd > T) 
mivec_ndx(r_count) = 1; % ID’s position 
$mb_count = mb_count + 1; 
b_count = b_count + 3; 
LAST (r_count) = 2; 








go = 0; 
break; % gets out of the inner loop if justified now 


end 


£1_8x8 £1 (rndx+8:rndx+15, cndx:cndx+7) ; 
£2_8x8 £2 (rndx+8:rndx+15, cndx:cndx+7) ; 
asd = abs (sum(sum(f£1_8x8-£2_8x8))); 


if (asd > T) 
m_vec_ndx(r_count) = 1; % ID’s position 
$mb_count = mb_count + 1; 
$b count = b_count + 4; 

LAST (r_count) = 4; 
go = 0; 
end 


end for cndx 


end % if LAST 


if ((LAST(r_count) == 2) & (go)) 
for cndx = 1 i | 
£1.8x8 = £1(rndx:rndx+7,cndx+8:cndx+15) ; 


£2._8x8 = £2 (rndx:rndx+7,cndx+8:cndx+15) ; 
asd = abs (sum(sum(£1_8x8-f£2_8x8) )); 


if (asd > T) | 

m_vec_ndx(r_count) = 1; % ID’s position 

%$mb_count = mb_count + 1; 

$b count = b_count + 1; 

LAST(r_ count) = 2; 

go = 0; 

break; % gets out of the inner loop if justified now 
end 


f1 8x8 = £1(rndx+8:rndx+15, cndx:cndx+7) ; 
£2_8x8 = £2 (rndx+8:rndx+15,cndx:cndx+7) ; 
asd = abs (sum(sum(£1_8x8-£2_8x8) )); 


if (asd > T) 

m_vec_ndx(r_count) = 1; % ID’s position 

mb _count = mb_count + 1; 

$b_count = b_count + 2; 

LAST(r_count) = 4; 

go = 0; 

break; % gets out of the inner loop if justified now 
end 


£1_8x8 = f£1(rndx+8:rndx+15,cndx+8:cndx+15) ; 
£2_8x8 £2 (rndx+8:rndx+15, cndx+8:cndx+15) ; 
asd = abs (sum(sum(f£1_8x8-f£2_8x8) )); 


if (asd > T) 

m_vec_ndadx(r count) = 1; % ID’s position 

$mb_count = mb_count + 1; 

%b_count = b_count + 3; 

LAST (r_count) = 3; 

go = 0; 

break; % gets out of the inner loop if justified now 
end 


£1_8x8 = £1(rndx:rndx+7, cndx:cndx+7) ; 
£2_8x8 = £2 (rndx:rndx+7,cndx:cndx+7) ; 
asd = abs(sum(sum(f1_8x8-£2_8x8) )); 


if (asd > T) 
m_vec_ndx(r_count) = 1; % ID’s position 
mb count = mb_count + 1; 
$b count = b_count + 4; 
LAST (r_ count) = 1; 
go = 0; 
end 
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end *for cndx 
end % if LAST 


if ( (LAST (r_count) == 3) & (go)) 
for cndx = 1 


£1_8x8 = £1 (rndx+8:rndx+15, cndx+8:cndx+15); 
£2_8x8 = £2 (rndx+8:rndx+15, cndx+8:cndx+15); 
as@ = abs(sum(sum(£1_8x8~-£2_8x8))); 


if (asd > T) 
m_vec_ndx(r count) = 1; % ID’s position 
$mb_count = mb_count + 1; 
tb count = b_count + 1; 
LAST (r_count) = 3; 
go = Q; ; 
break; % gets out of the inner loop if justified now 
end 


£1_8x8 = £1(rndx: rndx+7, cndx:cndx+7) ; 
f£2_8x8 = £2 (rndx:rndx+7,cndx:cndx+7) ; 
asd = abs (sum(sum(f£1_8x8-f£2_8x8) )); 


if (asd > T) 

m_vec_ndx(r_count) = 1; % ID’s position 

$mb_count = mb_count + 1; ) 

$b count = b_count + 2; 

LAST (r_count) = 1; 

go = 0; 

‘break; % gets out of the inner loop if justified now 
end 


£1_8x8 £1 (rndx+8 : rndx+15, cndx:cndx+7) ; 
£2_ 8x8 £2 (rndx+8 :rndx+15, cndx:cndx+7) ; 
asd = abs(sum(sum(f1_8x8-f£2_8x8))); 


ot 


if (asd > T) 

m_ivec_ndx(r count) = 1; % ID’s position 

$mb_count = mb_count + 1; 
$b_count = b_count + 3; 
LAST (r_count) = 4; 
go = 0; | 
. break; % gets out of the inner loop if justified now 
end 
£1_8x8 = £1(xrndx:rndx+7, cndx+8 :cndx+15) ; 
£2_8x8 = £2 (rndx:rndx+7, cndx+8 :cndx+15) ; 
asd = abs (sum(sum(£1_8x8-£2_8x8))); 


if (asd > T) 
m_vec_ndx(r count) = 1; % ID’s position 
$mb_count = mb_count + 1; 
$b_count = b count + 4; 
LAST (r count) = 2; 
go = 0; 
end 


end for cndx 
end % if LAST 


if ((LAST(r_count) == 4) & (go)) 
for cndx = 1 


£1_8x8 = £1(rndx+8:rndx+15, endx:cndx+7) ; 
£2_8x8 = £2 (rndx+8:rndx+15, cndx:cndx+7) ; 
asd = abs (sum(sum(f£1_8x8-f£2_8x8))); 


if (asd > T) 
m_vec_ndx(r_count) = 1; % ID’s position 
$mb_count = mb_count + 1; 
$b count = b_count + 1; 
LAST (r_count) = 4; 
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go = 0; 
break; % gets out of the inner loop if justified now 


end 


£1_8x8 = f£1(rndx:rndx+7,cndx+8:cndx+15) ; 
£2. 8x8 = £2(rndx:rndx+7,cndx+8 :cndx+15) ; 
asd = abs(sum(sum(f£1_8x8-£2_8x8) )); 


if (asd > T) 
m_vec_ndx(r_count) = 1; % ID’s position 
$mb_count = mb count + 1; 
$b count = b_count + 2; 
LAST (r_count) = 2; 
go = 0; 





break; % gets out of the inner loop if justified now 


end 


f1 8x8 = £1(rndx:rndx+7,cndx:cndax+7) ; 
£2_8x8 = £2 (rndx:rndx+7, cndx:cndx+7); 
asd = abs (sum(sum(f£1_8x8-£2_8x8) )); 


if (asd > T) 
m_vec_ndx{r_count) = 1; % ID’s position 
$mb_count = mb_count + 1; 
$b count = b_count + 3; 
LAST (r_count) = 1; 
go = 0; 


break; % gets out of the inner loop if justified now 


end 


£1_8x8 = £1(rndx+8:rndx+15, cndx+8 :cndx+15) ; 
£2_8x8 = £2 (rndx+8:rndx+15, cndx+8:cndx+15) ; 
asd = abs (sum(sum(f1_8x8-£2_8x8) )); 


if (asd > T) 
m_vec_ndx(r_count) = 1; % ID’s position 
%mb_count = mb_count + 1; 
$b count = b_count + 4; 
LAST (r_count) = 3; 
go = 0; 
end 


end %for cndx 
end % if LAST 


end %if ~m_vec_ndx 
end % for rndx 


byAge2 = sum(m_vec_ndx); 
byAge = (byAgel == byAge2); 


function [table] = make_HAC_table(); 
% Generates the Huffman VLC table 


vecl = ones(10,1); 
vec2 = (1:10)°; 


index = (1:162)’; 

runn = [0;veci*0;veci;vecl*2;vecl*3 ;vec1*4;vecl*5;vecl*6; 
vecl*7>vec1*8;vec1*9; vecl*10;vecl*11; vec1*12;veci*13; 
vec1*14;vec1*15;15]; 

Siz = [O;vec2;vec2;vec2;vec2;vec2 ; vec2; vec2 ; vec2; vec2; 
vec2 ; vec2 ; vec2 ; vec2 ; vec2; vec2;0;vec2] ; 

cw_len = [0;2;2:374:5;7;8;10;16;16;4;5;7;9;11;16;16;16;16; 
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16;5;8;10;12;0nes (6,1) *16;6;9;12;0ones(7,1)*16;6;10; 
ones (8,1)*16;8;11;o0nes(8,1)*16;7;12;0nes(8,1)*16;8; 
12;ones (8,1)*16;9;ones (9,1) *16;9;ones (9,1) *16;9; 
ones (9,1) *16;10;ones(9,1)*16;10;ones(9,1)*16;11; 
ones (19,1) *16;11;0nes(10,1)*16); 

table = [index, runn, siz, cw_len]; 


function [a] = make_it_compact (big, index, size) 


% gets rid of trailing zeros in the vector BIG that resulted from scanning 99 matrices 
% of dimensions SIZExSIZE. INDEX holds the position of the last non-zero entry in each of 
% the 99 matrices. 


entry = size%2; 
offset = -entry; 
a= [); 
for ndx = 1:99 
offset = entry + offset; 
if isinf (index (ndx) ) 
a = [a ; inf]; 
else 
a= [a ; [big(offset+1:offset+index(ndx))}}]; 
end 
end 


function [parsed] = parse_3D(vector, index) 


% RLE’s VECTOR into the {last,run,level} format. INDEX is the position of the last non 
% zero entity in each of the 99 matrices. 


last = [J]; 
level = [{]; 
run = []; 


for ndx = 1:99 
if isinf (index (ndx) } 
index(ndx) = 1; 
end | 
point = sum(index(1:ndx)); 
seq_to_code = vector (point-index (ndx)+1:point) ; 
len = length(seq_to_code) ; 
if (({len == 1) & isinf(seq_to_code).) . 
last = [{last;0]; 
level = [level;0]; 
run = [run;0]; 
else 
dummy = seq_to_code(find(seq_to_code)); 
level = [level; dummy] ; 
last = [last;zeros (length (dummy) -1,1);1]; 
this_run = get_run(segq_to_code, length (dummy) ); 
run = [run;this_run]; 
end 
end 





parsed = [{last,run,level]; 
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function [parsed] = parse_Huff (vector, index) 


% Parses VECTOR into the JPEG format. INDEX holds the position of the last non-zero 
% value of each of the 99 matrices. 


last = []; 
level = []; 
run = []; 


for ndx = 1:99 
if isinf (index (ndx) ) 
index (ndx) = 1; 
end 
point = sum(index(1:ndx)); 
seq_to_code = vector (point-index(ndx)+1:point) ; 
len = length(seq_to_code) ; 
if ((len == 1) & isinf(seq_to_code) ) 
level = [{level;0]; | 
run = [{run;0]; 


else 
dummy = seq_to_code(find(seq_to_code) }; 
level = [level;dumny]; 


this run = get_run(seq_to_code, length (dummy) ); 
run = [run;this_run]; 
end 
end 


level = abs(level); 
S = zeros(length(level),1) ; 


for mdx = 1:length(level) 
if level (max) 
S(mdx) = length (dec2bin (level (mdx) }); 
else 
S(mdx) =-0; 
end 
end 


parsed = {run,S]; 


function [ll_q] = quantizer_11(11,0,m_vec_ndx) 


% Quantizes matrix 11 with quantization matrix Q. M_VEC_NDX identifies 
% where this quantization need be performed. 

% 

% see UNQUANTIZE_LL 


ll_q = zeros(792,8); 
offset = -8; 


11 = 11* 16; 
for ndx = 1:99 
offset = offset + 8; 


if m_vec_ndx (ndx) 
1l_q(offset+1l:offset+8,:) = round(1l(offset+1l:offset+8,:) ./ QO); 
end 


end 
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function [out,rast] = raster (mat,in) 
% Horizontal raster scans the input matris MAT of dimensions INxIN. 


entry = in*2; 

offset = -entry; 

mat = mat.’; 

rast = reshape(mat,entry*99,1); 


for ndx = 1:99 
offset = offset+entry; 
dummy = max(find(rast (offset+1:offset+entry) )); 


if isempty (dummy) 
out (ndx) = inf; 
else 
out (ndx) = dummy; 
end 


end 


out = out.’; 


function [out] = remake_3(f_3,f_fht.11,f fht_lh, £_fht_hl,f_fht_hh,m_vec_ndx, in) 


% Performs the inverse FHT. F_3 is the present content of the reconstructed image. 
% The next for parameters are the subbands that update this content as dictated 
% by the content of M_VEC_NDX. INxIN is the matrix dimensions. 


half = in/2; 


if (f_fht_1l == 0) 
£_fht_ll = zeros(99*half,half); 
end 


if (£_fht_lh == 0) 
£ fht_lh = zeros (99*half,half); 
end 


if (£_fht_hl == 0) 
£_fht_hl = zeros(99*half,half); 
end 


if (f_fht_hh == 0) 
f fnt_hh = zeros(99*half,half); 


end 

B= [111 1;1 2 -1 -1;1 -1 1 -131 -1 -1 1); 
x = zeros(4,1); 

a = zeros(4,1); 


offset = ~half; 
out = zeros(99*in,in); 
£3 = £.3 /4; 
for ndx = 1:99 
offset = offset + half; 
ost2 = offset*2; 
if m_vec_ndx(ndx) 
for rndx = l:half 


for cndx = l:half 
x = [£_fht_11(rndx+offset, cndx) ; 
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f_fht_lh(rndx+offset,cndx) ; 
f_ fht_hl (rndx+offset, cndx) ; 
f_ fht_hh(rndx+offset,cndx) ]; 
a=B\ x; 
f_3 (rndx*2-1+o0st2:rndx*2+ost2,cndx*2-1:cndx*2) = ... 
{a(l) a(2); a(3) a(4)]; 


end 
end 
end 
end 


out = £.3*4; 


function [f_out] = shape(f_in) 


% Shapes the 144x176 f_in into a 1584x16 matrix taken 16 rows at a time 
% left to right, top to bottom. (raster scans the macroblocks.) 


f_out = [J; 


for rndx = [1 17 33 49 65 81 97 113 129] 
for endx = [1 17 33 49 65 81 97 113 129 145 161) 
f_out = [f_out; [f£_in(rndx:rndx+15, cndx:cndx+15)]]; 
end 
end 


function [f_out] = shape_back(f_in) 
% Shapes the 1584x16 matrix back into a 144x176 image. 
f out = 


f_ row 
offset = -176; 


{]; 
{]; 


for i1=i1:9 
offset = offset + 176; . 
({1 17 33 49 65 81 97 113 129 145 161] + offset) 


for rndx = 
f_row = [f_row, [f_in(rndx:rndx+15,:)]]; 
end 
f_out = [{f_out; f_row]; 
f row = []; 
end 


function [11] = unquantize_1l1(ll_q,Q,m_vec_ndx) 


% Unquantizes ll_q with quantization matrix Q. M_VEC_NDX idenntifies where this 
% unquantization need be performed. 

- % 

% see QUANTIZE_LL 


11 = zeros(792,8); 
offset = -8; 
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for ndx = 1:99 
offset = offset + 8; 





if m_vec_ndx(ndx) 
ll (offset+l:offset+8,:) = ll_q(offset+1l:offset+8,:) .* Q; 
end 


end 


11 = 11 / 16; | | | 


function [out,vert] = vertical (mat, in) 
% raster scans vertically top to bottom 99 matrices of size INxIN contained in MAT. 


offset = -in; 
offset2 = -in%*2; 
vert = zeros (offset2*99,1); 


if (in == 8) 
for ndx = 1:99 

offset = offset + 8; 

offset2 = offset2 + 64; 

vert (offset2+1:offset2+64,1)=[(mat (offset+1:offset+8,1) ;mat (offset+1l:offset+8,2); 
mat (offset+l:offset+8,3) ;mat (offset+1:offset+8, 4) ;mat (offset+1:offset+8,5); 
mat (offset+1:offset+8,6) ;mat (offset+1:offset+8,7) ;mat (offset+1:offset+8,8)]; 

dummy = max(find (vert (offset2+1:offset2+64))); 


if isempty (dummy) 
out (ndx) = inf; 
else . 
out (ndx) = dummy; 
end 


end 
else 
for ndx = 1:99 
offset = offset + 4; 
offset2 = offset2 + 16; 
vert (offset2+1:offset2+16,1)=[mat (offset+1:offset+4,1) ;mat (offset+1:offset+4,2); 
mat (offset+1:offset+4,3);mat (offset+1l:offset+4,4)]; 
dummy = max(find(vert (offset2+1:offset2+16))); 
if isempty (dummy) 
out(ndx) = inf; 
else 
out (ndx) = dummy; 
end 
end 
end 


out = out.’; 


function [place,out] = zzb(bigmat,M). 


% zigzag scans the 99 matrices of size MxM contained in BIGMAT. 
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vec 


for ndx 





zeros (M*M,1); 
loop = 0; 
out = [}; 


1:M:M*99 








loop loop + 1; 
mat = bigmat (ndx:ndx+M-1,:); 
vec (1) mat(1,1); 


% start scan at 


~ = 25 
y = 1; 
index = 2; 


% initial scan direction 
xstep = -1; 

ystep = 1; 
% process each interior diagonal 
for i=2:(2*M-2) 


% determine diagonal length 


if (i > M) 
len = 2*M - i; 
else 
len = i; 
end 


% run the diagonal 
for j = 1:len 


vec(index) = mat(y,x); 
% move to next point 
x = xX + xstep; 
y=y + ystep; 
index = index + 1; 
end 
% set up next pass 
xstep = -xstep; 
ystep = ~ystep; 
if (x == 0) 
if (y <= M) 
x. = 4; 
else 
x =X + 2; 
y = M; 
end 


elseif (x > M) 
x M; 
y y + 2; 
end 


— 
— 
— 
— 


if (y == 0) 


if (x <= M) 
y i; 
else . 
x 


y 
end 


M; 
y + 2; 


elseif (y > M) 


y = M; 
x =X + 2; 
end 
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end 


vec (M*M) = mat (M,M) ; 
dummy = max(find(vec)); 


if isempty (dummy) 
place(loop) = inf; 
else | 
place(loop) = dumny; 
end 


out = [out;vec]; 
end 


place = place’; 
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